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' itfoterials and methods 

' The mato genome was the model for the simulation. The 
. jifliulated 'genome contained ten 20o-eM*hr«nio*ornes. Simu- 
lation of crossing over was based on a 3?ojsson distribution With 
a meatfef 2.0 (X « 2) (Hanson, 1959}, ivhlch, Oft average, - 
genera one cross over fox « very JfiO-cM length. The aimoU- 
oons reported here>afswnB no ifUert&rene*, Codoniinant £e- 
^nodc markers were evenly distributed in the genome anil sites 
ofthedonor gen *'\ver& randomly assigned to genome locations, 
Simulajions were conducted wicb ch^foIlcwLn g parameters; 

Number of progeny: 100 or 500. * _ 

*m Baekcross jenentigrtK: BC (> Bf^, and BC r " ~ 
?L Number of markers: 20,40. 80, or 100, " 
, - Number selected to form die ne*; BC generation; 1 or 5. 

Selection was b&aed on 1) pr&sence of the donor allele and 2) 
high ORF^KRP was calculated u the average of the (one or 
- five) selected individuals. Values presenrad are the mean of 50 
jitfjulaiioni, - 

Remits 

In Che computer simulation srudy, all methods modeled 
greatly increased the speed of recover! tfg^tho RP genome 
^ ( compared tft the expected recovery wi i h. mT marker- assl stcd 
^selection (compare Tables 1 and 2). Ac lease 80 markers were 
' . required to recover 99% of the RF genome in just three BC 
g&nernriona (Table 2), Use of at least 80 mariar* and 500 
t progeny allowed recovery of 98% BP in Just xaro BC genera- 
_ tions. Response to selection was diminished only slightly by 
spreading the effort over five selections. Using markers! dte 
• number o fbackcroB* generations needed to convert an Inbred 1$ 



reduced from aboi&geven to threa, 

By the BC, ge^Saiian. there appcors to be no practical 
advantage to issfugioQ vs. loo individuals. If the presence or • 
the donor trait In the brclceros* individuals can be ascertained 
before markers are genotyped, then only hatf the niilflber of 
individuals indicated In the Ublesjyill need to be analyil& " 
rC When a small number of n^sfcers are used, ttrey qnickJy 
became non-informative; j,e, f selecbon causes the mark* lpei 
to became fixed/or the RP type before the rest of iho'fienome 

- is fully convened (Table 3; Hospital et aI53992), Thii «wadon 
w^smostprernmentjit'iht larger populaxions, where **MfriM- 
.aeteciion intensity placed more selection pressure upon the ~~ 
marker loci, Accordingly, it is of interest co consider* how 

^closely the estimation or ftRF-bijed-on markers reflects ihe* 
actual genome^eampoifdon^he combination of estimation of - 
%KP bused on fewer markers and substqueniselection tends io** 
bias the estimates upwixd (compare Tables 2 and 3). 

The result ^omth^^imufadon compare well with real field 
. data. 1ft a typical ex&ttple, 5QPC, pIantrc«Tyirig the gene being 
transferred were genotyped at S3 polymorphic RFL.P loci (note 
that thia corresponds to a population size of 100 unselected 
plants in Tables 2 and 3). The five best BC. recoveries had ~ 
Sitimated 96RF values of 85,99& p 1X7%, 52-0%, 81.4% f ana 
81.2ft, After evaluating.tOBCj plants from each selected BC . - 
the best BC 2 recovery had a* estimated of 94.6%. ~*~--« 



Dieciualaxi 



Tfae slmelations (Tabic 2; Hospital et at.,- 1992} and our 
ixperlence JjidicatciJiai four markers per 200-cM ehromoiomc * 
Is adequate to arcatlf increase the effectiveness of selection in - 
the BC,. However* using only four markers per 200 cM will 
h'fcely make ir very difficult to map the location of the gene of - 
interest, Adequate-sutnmtfizatie^ of the 'data is ari important 
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: of a marker -ass js tedbackcf oaa prognjm Jd&ftUy . the roark- 
u«ed can supply data that can be represented as ml! etc* of loci " 
. y with known map posltlojy Estimation of ftftp* mapping the 
position Of the locus of IiStrcst, and ptyfcjcal display of the 
icsulu Cfrung aiid TanJqlcy, 1939) are all uitful in under- 
standing and controlSSg Uic specific backcrosi cxjieriment 
.fcdng conducted, * v 

11 apptaxsihac. with the use of genetic tSoVers, Che portion 
, tfttw RP genorni*that is not linked, to the Slide being trans- 4 
^Jetted cib be recovered "quickly" tod with confidencc/^Ibd." 
. iccovery ofRF be dower on the chromosome carrying the 
*"*\genc of interest A considerable sirlEunt of linkage drag is 
'• Bcpected w accompany selection for the ftP allele in a back- 
- >.£*o» program. For a locus located Inlhe middle of* 200 -cM 
"chromosome^ the length, of the chromosome segment ac- 
7companying selection Is expectedno ne 126, 63, and 28 eM in 
'*d&*BC , BC V and BC ? ganerarions, respectively (Hanson, 
' j059;Naveiiaajtf Bartadilla, 1992). Oar obscrvatinnB support \ 
therecommeiidatlonof Hospital seal. (1992^ that preference be 
.'given to the selecdon for recombinants proximal to the allele of 
'.'•.interefc but that selection for recovery of the elsewhere in 
the genome alia be considered. Thi* two-edge selection cm 
probably be done Quite effectivoty ad tibcjby the breeder once 
" ; the dtra is adequately summarized; however, Hospital el al. 



suggest ways to Incorporafe the t*d criteria Into & selection 
index such chat etch component of selection h assured appro- 
pjfet* welehdn^ : ^ 
: ' Ujeof genetic markcr&ean greatly Increase tteefTectiveftWs 
of bAdtcrosring, knd they flbn^d^uoedjfl any ttrftHis btt*-^ 
cresting program if raourcei are avail! file to the breeder^*- 

* " ^lenture Cited ^ _ "T^ 

v>tiZ(ircC £ W- i W.*fn"ncipte4 o/j^t^icedin* Wiley. Ne* Yoxlc. 
f*Ar, tg£J$£7. l Printipki orajruvij-itevclopfjifini! v.j. TneoryjURd • 
■ technique Mianlllan,"Ncw Yoric 
WJW^tewlygwig^ 
ehnmoieme Memento around a foeiu held heterocyzous 
bdektroBft!tifioraelfif^.Oeneiie8 44:eHMA7. - 

gene innc^sion breeding pr&flr»mr. Gene|ics432: 1 1 * 
TtoJb, EM and IC Sriiz. ./&?/. Moving cofM^dt sermplum 
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Shavcf. 0.L f9T6. Conversioni tor eartinew m, maize inbrcdi. }A*i7a 

GelleLDKlpL^fw^U^5C)^20-23. - ^ 
Young. S.D. MdS.P. T&Jtrlcy. 1989. Restridcion fr*gmcnt"Wgin "* 
polymorpWjmmapsandaM conijeptof ^fiphical genotypes Thco> 
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2 MAIZE - THE 'PLANT AND. ITS PARTS 



R. Scott Poethig *' : - 

Department of Agronomy, Curtis Hall ^ . * 
University ofe Missouri ^ " 4 ' " 

Columbia, MO 65211 " * 

One of the greatest deterrents tojan- appreciation of ptent^morphologjr is 
the terminology used to describe various* plant pacts v This problem" is—* 
cofhpounded in the casejof maize" because of its relatively ujjusual structure, 
^*We ~all learn that plants have a vegetative -body composed *of stems, leaves 
^and roots, and that flowers " contain sepals, petals-,, pistils and ..stamens, 
-Maize, however, "has at least three kinds of leaves, two kinds of^temsy-two 
kinds of roots, *and two kinds* of flowers in v/hieh flumes, lemmas^and paleas 
"-take the place of sepals and petals ^ Fortunately, these parts are arranged 
in a relatively simple fashion, so the task of mastering maize , morphology is 
not as difficult as it might seem. In this article.. we identify some of the_ TI 
0 most* important parts of the maize plant and describe? their organization. 
More detailed descriptions of the -developmental morphology" of maize have 
been provided by a number, of investigators- Kiesselbach ^(1949, reprinted.. 
1980) gives a good general picture of maize structure and development. The., 
external morphology and the .histology ^of the vegetative.~and reproductive 
shoots have-been studied by Bonnett (1948, 1953), Sharman- (JL342) and Abbe . 
and co-workers (Abbe and Pjtiinney, 1951 Abbe etai- f . 1951), while the, most, 
comprehensive descriptions of the embryogeny are" those of Randolph (1936) 
and Abbe and Stein (1954)- A summary of the 'histology of the corn plant, 
written by Sass in 1955, has been reprinted in the recent edition of Corn 
. and Corn Improvement (19?6). - 

The organization of the plant body : ' Maizfe is a member - of the grass 
family, the Gramineae, and-*as in all grasses, mostjrf the plant body is leaf 
tissue (Fig. la). To appreciate the general organization of ;the maize*, plant 
it is helpful, therefore, to see Jt in a leaf -less state (Fig.* lb). Stripped, 
naked f the maize plant is not very impressive. Jts^main stem, or culm , is a 
slender, segmented shaft similar to a stalk of bamboo or sugarcane. The 
enlarged joints along the stem/"' the nodes , mark the points of leaf attach- 
ment; the stem segment between- nodes is called the in tern ode, Each node 
bears a single leaf in a position opposite that of the neighboring le&f, -giving 
the plant -two vertical rows of -leaves in a single plane (Fig. *Ja; 2). This^ 
so-called distichous phyllotaxy is -typical of all leaf -like —.appendages? 
wherever 'they occur on the plant. , . 

Maize has unisexual," rather than bisexual, flowers^ Male" (staminate) 
^ flowers are located at the apical tip of the main stem in t£i& tassel , a 
branched inflorescence. /Female (pistillate) flower-s are found in one Jto 
several compact ears, located on the ends of short branches near the middle 
of the stem (Fig. lb; 2). 

This partitioning of male and female flowers in separate structures 
distinguishes maize from other cereals and is one of the principal reasons 
that its genetics has been so conveniently explored- Making controlled 
pollinations in maize requires little more effort than that involved in placing 
a bag over the tassel and ear shoot. To perform a controlled pollination in 
rice, wheat, barley and other cereals, it is necessary to emasculate each 
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flower used as a female parent/ an especially tedious job when each flower 
yields only one seed. 





B 



Figure T a) Mature maize plant (after Kiesselbach, 1949). b) Mature mawe 
Figure jj ^ out and adventitlous ro ot-s. The apical 

end of the main stem (culm) terminates in the tassel, while the 
basal end terminates in the primacy root (radicle) The ear shoot 
arises from an internode near the -center of the -culm. . 

Maize also differs* from closely related species in that it >as relative.ly 
few branches' - Only the lower 1CT to 12 ^internqdes of the stem produce 
"K^^rtSrdla,-Ll most of these remain suppressed. Above- grout .d 
pSSordia P develop into ear shoots, while those located at_subterranean rnter- 
'^odes develop into tilTers -branches identical in structure to-ltoe main stem. 
"^^eS hybrids "Tiicept sweet corns,) -generally tffler.vSfy.UtUe and 
* ? y S pX a single* viable ear shoot. In contrast som^ 'Vaneties" 
Say have several large tillers and may produce 2 ears on the «un stem and 
some ears on tillers. * 

The stem : During the first four weeks after germination th Rowing 
point of the stem lays down all the nodes and internode* of the .P^an* 
teen differentiates into a tassel. At the time of tassel formation the stem is 
not more than 3-4 inches tall, even though the plant may be 3-4 feet in 
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he*ht (Fig. 3). S—nt^ 

most, of the growth occurring at ** se * £ e however, and remain below 
6-8 interludes do not P«^» te ^ ^^^jS'tuiers. -rfcese subterranean 
ground where they produce the ro ?* the stem forming a "distinctive 

Lernodes taper sharpy .towards *e b^dj.thertj» f & ah 

region, the. crown (Fig. lb). ^ l tassel -. AU the intemodes' from 

ground, "^P^U^^^^^eWcLited with th^axmary^bud 
T^^T^t^Ls^^^e the ear^^mary bud,. 

and are- smoothly cylindrical. _ - ^ - 



- central sp.ke 



^secondary 
mala flora* 




-leaf sfoeath 



FiK u,e 2. The major parts oi *e -a^e p!an. «s m Part^rom P. 
assemble by M. M. J°hri and E. H. Coe. 
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• The ste* of an ear shoo t , called ^J^*^^£l^ 
ma in. stem in being relatively short m most sterns in * ^ ^ 
nodes 'of t*e shank «* ™^ See. Sectary ear. shoots 

tend to have a c ™^ d <h r "J e L ^r^^P** of-maize, bo! are rare in mol! 
«Sa: h S» ^apical ear is preve^ed. 





-■iSt.SSsSi.stfflrd.tively»nortatthlB,sW. . - 

V " The tassel : The tassel, If a|d ;< at the top of tfj^g* 
series" of large^branches <sp*2S> SnCh io&t on'a spike Bear* 

bearing branches (spikelets: Fig. 2) Each & brancn p a _shprt-*fem 
two spikelets, one" on a long stem ie S™ffi ffil ? ur n rkoduces two func- 
(sessile) (Fig- 4a). Each of these-sp Relets, ^"^ef ^d a pistil, .the 

brSL^v^^ ^S£^c4~ and are ,uite 

common on tillers. 
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B 



Fiffure 4 Schematic drawing of a pair of . tassel spikelets (A) -and . a pair of 
ear spikelets (B). -Note that the lower floret in the ear spikelet u 
aborts early in development, p.s. - pedicellate spikelet;"S,s. - 
sessile spikelet; gl - glumes; le - lemma; pa - palea; fl ---floret. 

Surrounding both florets on a spikelet are 2 ;leaf-like scales called-' 
Flumes (Fig. 2: 4a). Within the glumes, each floret is. individually -enclosed 
in another pair of scales, one located adjacent to .the, glume. (the 3|££§>. the 
other located between the two florets (the palea) (Fig. -4a)„ At an thesis, 
these scales are forced -apart by the swelling of -corneal structures^, 
(lodicules) at the base . of the 3 Stamens, and the • filamentous base of the 
stamens elongates ,• forcing the anthers out of the flower (Fig,.. 2). As- they., 
dangle downwards, the anthers sh.e3 pollen from openings at their fip. 

Pollen grains are the multicellular products of the haploid microspores 
"that result from the meiosis of a microspore mother cell-.(microsporocyte) 
Meiosis takes place in the "anther" before the tassel emerges frQtn the leaf 
sheaths. After meiosis, the "4 resulting haploid microspores separate from 
each other, and each forms.. a thick wall. Shortly before shedding, each 
microspore undergoes two mitotic divisions. The first diyiskm is asymmetric, 
and produces a relatively large vegetative cell -and a smaller generative, cell. 
In the -second division, the generative cell divides to.-form tw ; o sperm -cells. 

The-ear- The ear is morphologically similar to- the tasself although this 
resemblance is obscured J^y differences in the relative size^f their parts^.. 
The crucial -difference, between tixem is, of course, that the tassa&f con tarns 
male flowers, and the ear Dears female ones. This~diff.erence„is-due sunply 
to the fact that duiaffg the formation of.-air ear floret stamen pranordia are 
arrested at an early stage in their development,, while the pistol develops 
fully Each functional ear floret has ,a single ovary , which terminates mran 
■ elongated style, or silk (Fig. 5). Within the ovary is a single embryo sac^ 
The embryo sac is thTproduct of one of the four haploid cells resulting from 
the meiosis of the megaspore mother cell. While its three sister cells de- 
generate, the nucleus of this cell divides three times to produce 8 haploid 
nuclei within a common cytoplasm (the embryo sac). Two of these nuclei 
(polar nuclei) migrate to the center of the embryo sac, where they become 
c Ey associ ated The three nuclei remaining at the base of the embryo sac 
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subsequently undergo cellularization to form the egg cell and two synergids, 
while the 3 nuclei at the tip of -the embryo sac proliferate to fonn 24-48 
antipodal cells , ^ 




_ ^ 



a? 1 



Firure 5. Radial longitudinal section of an -ovary with _an unfertilized . 

embryo sac (after Randolph, 1936).' Upon fertilisation, the 
nucellus is digested by, the expanding embryo sac.and the tissue _ 
surrounding the nucellus is transformed into the pericarp, si - «• 
silk; e.s. - embryo sac; nu - nucellus; in - integuments . - 

The ear also differs from -the -tassel in that it has no major lateral 
branches. Its thick, lignified axis, the cob, is homologous to the central 
spike of the tassel. As in "ihe tassel, ,ear spikelets come in pairs ,-^but in • - 
the ear they are equal in size and .only one of -the florets in -each spikeletas. . 
functional (Fig. 4b).. An ear therefore has an even number of- parallel rows- 
of equally sized kernels .equal to the" number of ; spikelets,.on -the,. cob. The 
"iiumber of rows (or ranks) oCJcernels ranges from 4 to 30. -~ • j 

The glumes, lemmas and paleas of" the- ear^ spikelets'ar'e readily visible - 
in an unfertilized ear, buWire soon dEscured by Jjie enlargement, of the „ 
ovary after fertilization. In a mature ear these structures arenrepresented 
"*-by the chaff that,, adheres to the cob and. . the base of the kernel^af ter It is 
shelled. ^ _ ~~ 

The leaf : Maize produces three kinds of vegetative leaves: foliar 
leaves , husk leaves and prophylls . A foliar leaf is located at each of the 
nodes on the main" stem, husk leaves are located on the shank of the ear 
shoot, and prophylls are found at the base of the shank between the ear 
shoot and the stem (Fig. 2). 
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Th<* foliar leaf has two distinct 1 parts— the blade, a flat, portion eife- 
tendilg away from the stem,, and the sheath, a basal part that wraps tightt* f 
around the stem (Fig. 2 Internally, the blade consists of a spongy 

htn S bL£ ^UessTs lew^T lon^udinal- veins, .and fas a P— . 
InKrih -Th« i sheath completely encircles the internode abeye the node to «c~ 

• it 2? attached SS P mayJextend th^entire length of. that mternode 
Durinff the e^rly development of the ptantr the^leaf shears . provide most of 
Sie ^ilcianicTLpport necessary to keep .the stem upright. At the bound- 

-£S£kr£ in^T^in & leaf margin. The -^"^^^ 
adjacent to this indentation is known as the aur^le . -^The hgule is.the .tnm_ . 
collar of filmy tissue located, on the inside -of the hinge. - •• v 

The husk leaves surrounding the -ear are usually considered modified ; 
leaf sheaths with vestiges of the blade portions occasionally present. In 
some sSSis. nnsk kavfs "develop a prominent ligule jwd leaf blade Jta 
contrast to the leaf sheath, husk leaves are relatively to and flat., Ea£ 
liusk leaf" is attached to a unique -node on the shank, and all n b»t. a few 
. upper ones are arranged distichously . ... ■.*- . ^ 

Located between an ear shoot and the stem, the prophyg looks super- ... _ 
firiaUv lSe a husk leaf but is distinguished by..haying. two keels .(midribs) 

- In? a riUt apex These features suggest that . the-prophyU; arose evolu- 
SnarHv from S fusSn of two foliar leaves. - The homology of the prophyU ... 
^Ttm^ontroversial, however. "Galinat (1959). for examp ^^ders he 
prbphyll one of the basic .units of maize morphology, the others being the _ 
internode, leaf and axillary bud. * - . — - • 

The root: More is known - about' the growth, ceU biology, physiology- 

iflr Cell »W» is restricted to. the apical .. ~- 

root 'and occurs at a maiimaTrate 1.25 mm behind the apex. -The zone Si 
eSneaSon extends 8 mm Behind the, *p+**J*« ~ 

Iritfson (ll?9; W80T_S Greeil U976-) for an analysis* fte'grcth para- • 

- "meters that must be Jtaken-into consideration-in such^ studies.. . . 

? The primary root represents the Sisal end of the pjgpt axis,"which in.,* 

Driinordia of a few adventitious roots are normally present in tne emory , 
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initiated above ground are known as brace roots . • 

& Adventitious roots * jKS P^t ofUn' cSSf a 

downwards.,- As a result > ..the root system 01 r00 V 5y mem may be* as 



4* 

few 



miles r . 



- The kernel: .. The~ events 'surrounding t»-pro£»* ^'^SSf'Srev 

unavailable. ~ - - , - 

tube where they remain throughout its " f t he r 00 ii en tube bursts,- .. 

- sac, 12 to 24 hours after B?™»*»' 'J 1 *? polar nuclei 

releasing the two sperm. /One sperm ™*°™<™"^^^ es P ri se , to the - 

The development of the kernel following '"^^Ahfa^^JSSt 
*»• detail bv Randolph (1936). We will only note here that this process taKes 

S-fu%s y anT| ^coWanied ^^^T^ ' 
^es^-the^'dofpt™ is%omp1etS\Ta b <»u, W 40. and'the remaining - 
XO-20 days is spent maturing_and "drying. . . _ 

" JJ 1 ! Irtted fro^ S.e™v*y-i«dl .and is therefore genetically identical _ 

SJSX%£*. tJTS*^- and embryo represent 



generation. - 



"'"* The endosperm mate* up'about. 85% of the weight^f' the hemjj ,«nAJ 

concentrated -to varying ™^ concentration ~ of~*starch and 

(Duvick, 1961). In fhnt-type kernels im conc^ endosperni than in the 
protein bodies is higher around ^e Penphery of ^ ^ a ° a * and a . soft, 

£ V« e »f Stance and are 
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characteristic of specific races of maize. Other common endosperm traits, 
sush as sugary, floury or shrunken, are single gen emulations and can exist 
in either a flint or dent background. ■+» 




'if* 




Figure 6. Longitudinal sectors of a mature dent kernel taken^ perpen*. -. 

dicular (left) arid parallel (right) to the upper face of the~kernel 
(after Kiesselbach, 1949). pe - pericarp;- en - endosperm; al - 
aleurone; sc - scutellum; co - coleoptile; pi - plumule; ra - ^ 
--• radicle: cr -coleorhiza. 

Much of . our understanding of gene action in" maize' is based on the 
analysis -of genes affecting the pigmentation of the external "layer of the — , 
endosperm, the aleurone . This specialized single cell layer is the only part 
of the endosperm capable of becoming intensely pigmented. ^-Internal endo- 
"- "sperm cells may be eitKer yellow-or. .white . ~ 

- " The embryo is«*located~on the broad side o'f the kernel facing the- upper ~ 
end of the ear, beneath a thin layer of endosperm cells. ~M©st of the tissue - 

" "in the embryo is -part of the scutellum, a spade-like structure concerned 

— with digesting" ana transmitting " to the germinating seedling the nutnenjs ~- 

- stored Si the~endosperm. Xhe shoot and root axis are recessed ^n-the outer ^ 
~**~face of the scutellum Ih a mature kernel, the shofft (plumule) has 5, to 6 - 
~~ leaf primordia that- are" arrested at successive stages of development (Abbe _ 

- and Stein, 1954) .- Surrounding the, shoot is a cylindrical structure called 
the coleoptile. Upon germination, the coleoptile elongates until it is above 
ground and is then ruptured by the more rapid expansion of the rolled 
leaves within it. The root is enclosed in a sheath of tissue called the 
coleorhiza. Unlike the coleoptile, the coleorhiza does not elongate very 
much — and gives way to the radicle as soon as it emerges from the seed- 
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Attorney Dock t No. 1328 , 

> IN THE UNItED^STATES PATENT AND TRADEMARK OFFICE ^ 
* . .I*** .^^L^rtv "Coate: february 28, 2$|3 • 



Applicant Joseph Kevin Gogerty ^ 
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? 1^ " Filed: ^ ^Januaiy-11^001 - : - ~* Examiner _ Da^T.Eox 
~ - ^ • For: * "^INBRED IvfAIZE LINE PH7CH" ~~ ^ jf 



Assistant Commissioner for Patents 
\ Washington, D.C. 20231 



RULE 132 DECLARATION" 
-OF 

DRr STEPHEN SMITH 

-Sir " 

I, Stephen Smith, PhD., dohereby declare and say as follows: 
1 l am skilled in the art of the field of the invention, l-have a Ph.D. in .Biochemical - 
Systematic* and Taxonomy of Maize and its Wild Relatives from Bi.rmingham.University, 1 1 . . 

•have a M Sc. in the Conservation and Utilization of Plant Genetic Resources. from. ., 
Birmingham University. I have a Bachelor of Science degree in Plant Sciences from London , 
University. Since 1977 1 have been engaged in the development; study and applicat^of 
molecular markers to genetics, measuring genetic diversity and tracking. pedigrees. I . 
commenced this work at North Carolina State University as a post-doctoral research fellow. I ~ 
have continued my engagement in these studies during my employment by Pioneer Hi-Bred ~ 

. from 1980 until the present. Th^se studies have resulted in.numerous scientific articles that 

have appeared in peer-reviewed - scientific literature. - - ■ 

2. I have read and understood the Office Action in the above case dated October 30, 

2002 This declaration- Z in response to.the Examiner's rejection under, 3.5.U.S.C. § 112, first _~ " 

paragraph as containing^bject matter which was^ot describeci-in the specification ^such*- _ 

a wTy as to reasonably conveylo one skilled in the relevant art thafthe inventors), at the 

time the application was filed, had possession of the claimed invention. ■.<• - 

3 | have conducted an analysis of Simple Sequence Repeat, SSR, marker data for base 

inbred PH7CH and a backcross conversion of PH7CH. The trait backcrossed into the 

backcross conversion of PH7CH was waxy starch. 
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4. The SSR data for 457 ba%e ihbreds and 103 backcross conversion inbreds, including:, 
PH7CH and the backcross conversion was used in the analysis. The number of SSR - ^ 
markers fopeach inbred used in the analysis was bgpween 15 and 87 (mean of 82 ). ThS, v .. 
anal^i? Was done as specified^phe publication by Berry e^aj. T Assessing Probability 0% ^ 
Ancestry-losing Simple Sequence!Repeat Profiles: Ap£lications to®Maize Hybrids and- 
Inbreds" Genetic^ 161:81 3-824, 2002):with modification as described in Beay et af, ( 2003 ); 

-Assessing Probability of Ancestry Using SSR Profilesj7\pplication.to maize inbrediines ancU- 
soybean varieties^Genetics {in review): a copy of which is attached hereto. _ z*»~- 

5. The results of the analysis indicated that through the, use of SSR markers 1 PH^QH 
was identified to be the recurrent parent* of the backcross conversion of PH7CH ever all the 
other inbreds in the data set. The probability associated with the identification of PH7CH as 
the recurrent parent of the backcross conversion was calculated as 0,99. 

m 6. \ hereby declarejhat all statements made herein of my own knowledge arg true and 
that all statements made on information and belief are believed to be true; and. further that - 

r these statements were made with the knowledge that willful false statements aj)d the like are., 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 ( pf the United. - 
States Code and that such willful false statements may jeopardjze the validity of the 
application or any patent issued thereon. \ 



Date: 2 - 2& -0?> 



Stephen Smith 
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Attorney Docket No. 1328 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE i 
Applicant: Joseph Kevin, Gogerty Date: pec, 10, 2002 

: * . Serial No.: 09/758,867 v Group Art Unit: 1638- 

- * Filed: January 1 J, 2001 ~. Examiner: J" David T. Fox - 
- For ~ "INBRED MAIZE LINE PH7CH" 3? " " 

-**-■. "~ ^Assistant Commissioner for PaTents - .. 

- . Washington, D.C. 20231 _ . "~ ' _ — 

-s. " RULE 132 DECLARATION " - - 

- OF " 

— * DR. STEPHEN SMITH - * 

-- Sir: - ' - 

\ ~ ~ - I. Stephen Smith, PhD., do hereby declare and say as follows: 

1. I am skilled in the art of the field of the invention. I have a Ph. D.-tn Biochemical 

- ^ Systematics and Taxonomy of Maize and its Wild Relatives from Birmingham University. 
I have a M.Sa in the Conservation and Utilization of Plant Genetid Resources from — 
Birmingham University. I have a Bachelor of Science degree in Plant Sciences from 
London University. Since 1977 I have been engaged in the development/study and 
^ application of molecular markers to^genetics ~ measuring genetic~diversity and Hacking . 

pedigrees. I commenced this work at North Carolina State University as a post-doctoral 
* - research fellow. I have continued my engagement in these studies during my 

"employment by Pioneer Hi-Bred from 1 980 until the present. These studies have 
resulted in numerous scientific articles that have appeared in peer reviewed scientific - 
literature. ' — ^ 

* 2. . I have read and understood the Office Action in the above case dated October 

. 30^2002. This declaration is in response to the Examiner's rejection ynder, 35 U.S.G; §~ 
m 1 02(e) anticipated by or, in the alternative, under J35*O.S.C. § 103(a) as obvidus over 
Garing {U.S. Patent No. 6,034,304).- - , . J " ' - 

3. I have conducted an analysis of SSR, marker data for inbred PW7CH and the 
" inbred cited as prior art, 90LDC2. Out of a total of 70 SSR loci examined, which allowed 
a sampling of each chromosome, there are 41 markers that show differences between 
PH7CH and 90LDC2. This represents a difference for 59% for the markers tested. Of 
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these 41 markers, 22 were greater than 50 cM in distance, or unlinked on the,genetic 

...map. 

4. Upon crossing PH7CH to any otheiumaize line and seifing successive filial 
generations, oneivould within the realm of What is statistically possible^btain a, progeny 
inbred maize line thgj retains genetic contn^utiorffrom PH7CH. Assjjmin^that (i) the^- 

.citedprior art is used as the maize line to which PH7CH is crossedrOD that the-onlyj^ 
" " ..difference between PH7CH and'90LDC2 are these 41 markers, and (Hi-) that alt markers 

within a 50 cM distance will segregate together, then the odds of'obtaining a PH7QH - ^ 
progeny inbred that is thesame'as 90LDC2 after one cycle of breeding, is-l-in-2 22 or 1 j>- 

- in 4,194,304". Statistically it js extremely unlikely that a PH7CH progeny, after one cycle _ t 

of breeding, would be the same as 9*0LDC2. - " - at - - J 

5. Further, the assumptions made above vastly'overstate thejikelihood of breeding 
PH7CH from 90LDC2. For example, it is common practice in quantitative genetics to 
determine the-relation of plants by.differences in markers. In d6ingso,>one.extrapolates 

- that a.percentage difference in markers is indicative of a difference in the-whole genome. — ^ 
To assume that the only differences between PH7Ctf and 90LDC2 t are for these 41 - 
markers, when 41 markers constitute 59% of the 70 SSR loci examined, is a gross and ~ • 
unrealistic assumption. Further'the current maize genetic map only has approximately ^ — 
sixty 50cM units, so by applying this limitation the maximum number of independently 
segregating loci one could obtain, using the most different maize lines that cpuld ever be_^ 
found, is sixty. These assumptions result in an over estimate of the odds of breeding ~« 
PH7CH from 90LDC2 ~ - ' 

6. Given the differences molecular markers between PH7CH aod 90LDC2, it is my** _ 
expert opinion that PH7CH and 90LDC2 are very distinct inventions. It is. also my expert 
opinion that, within the realm of what is statistically possible, any progeny of PH7QA.rfT~-j~ . 

..developed through crossing PH7CH with another plant will be distinct.from^QLDCZ £ 
Given the facts and based onjriy education and scientific experience, I believe that the ^ ^ 
. - invention as claimed is-not*obvious nor anticipated-by Garing {U.S^ Patent No. 
6^34,304)7^ " - ^ ^ - " ^ 

7. Thereby declare that all statements made herein of myown knowledge are true . 
and that all statements made on information and belief are believed to be true;~and 

further that these statements were made with the knowledge that willful false statements 
and the like are punishable by fine or imprisonment, or both, under Section 1001 of Titie 
18 of the United States Code and that such willful false statements may jeopardize the 
validity of the application or any patent issued thereon. 
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ABSTRACT 

Determining parentage is a fundamental problem in biology and in applications such as 
identifying pedigrees. Difficulties inferring parentage derive from extensive inbreeding within 
> the population, whether natural or planned; using an insufficient number of hypet^griable loci; 
and frotnallele mis-matches caused by mutation or'By_ laboratory exxors-th^t generate 1 false > . _ 
exclusions. Many studies of parentage have been limited to comparisons of small numbers of 
specific parent-progeny triplets. There have\been few large-scale surveysj>f caradidatesjn which 
there is no prior knowledge of parentage. We present an algorithm that determines the - .; : 

probability of parentage in cifcums'tances where there is no prior knowledge^of pedigree and — 
which is robust in the face of missing'data and mis-typed data. The focus is parentage of an ^ 
inbred line having uncertain ancestry. The algorithm is a variation of a previously published' - - 
hybrid-focused'algorithm. We describe the algorithm and demonstrate, its performance in tv ~ ~ ... 
determining parentage of 43 inbred varieties o f soybean that have been profiled using 236 SSR 
loci and fronfseyen inbred varieties of maize that were profiled using 70 SSR loci. W e include £ " 
simulations ofadditional levels of missing and mis-typed data to show the algorithm's utility and 
flexibility. " . - ^ _ 
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The determination of parentage using molecular marker data has been little addressed for 
situations where there is little or no prior knowledge of parentage, or when large-scale surveys ^ ^ 

involving numerous candidate parents are required. Consequently we. have recently developed^, 
an algorithm and demonstrated its use in determining probability^ofpa^entage foiOiybrids in ^ 
-circumstances where there is no prior knowtedge-of pedigree and which is robust in the face of. ^ . 

missing" or mis-typed data (Berry <?r a/. Z002),~We now present a variationof this algorithm that, ~ _ 
~ allows determination of parentage for inbred lines or homozygous varieties. — 



We describe and evaluate a methodology that quantifies the probability q£ parentage of ... 
homozygous genotypes. Our algorithm takes into account that generations of self-pollination ^ 
occur afterthe initial parental cross. The-number of generations and the initiajjparental genotypes 
are unknown. Each generation of inbreeding reduces the number, of heterozygous loci in the * 
progeny by an average of 50%. Thus, each of the inbred progeny individuals resulting from the . 
initial parental cross will have lost approximately half of the parental-alleles for loci where the 
inbred parents were fixed for alternate alleles and which were.heterozygous inthe.Fl generation. -% 

The loss of parental alleles during the inbreeding phase is in contrast to tfie ca$e of a hybrid . 
progeny. An inbred progeny individual will exhibit a lower level of allelic similarity either of 
its inbred'parents than a hybrid progeny^will to its inbred parents. This Toss of some parental 
alleles during inbreeding might be expected to make an inbred^algorithm less rol^ist in the face - 
of missing or mis-typed^data compared with the hybrid algorithm that has been pressdously-' _ 
described (Berry etal 2002). We .therefore demonstrate the effectiveness and robustness ofHhe 
inbred algorithm using examples from two species of cultivated plants. We first tested the ^ 
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algorithm using varieties of the naturally self-pollinating, inbred crop, soybean [Glycine max. 

(L.)Merr^ This crop was selected because numerous varieties of soybean with known pedigrees 

■I* 

were available to us, many of which are closely related. We also used publicly bred inbreds of 

* „ , ' i - ' ■ 

maizei(2ea mays L.) that "are of known^edigree. Maize is naturally^outcrossing species but^ - 
inbraflines are^most usuallygenerated for use as parents of commercial hybrids. Inbted lines are ^ : 
generated by making successive generations'^ self-pollination folio wing -the:imtial bi-parental 



V. 

cross. 



MATERIALS AND METHODS . 

Algorithm: The algorithm is a variation^ the hybrid version of (Jerry-"** al (2002). Consider an 
index inbred whose parentage is unknown or in dispute. A database containing possible inbred 
. ancestors is available. The objective is to find the probabilities of closest ancestry for each inbred 
in the database using genotypic information from a large number of SSRs. - 

Consider a pair of possible ancestors 3 inbred / and inbred We calculate the .probability^! ^ _ 
inbreds i and j are in the index's ancestry, .repeating this for all pairs of jnbreds in the database. 
Let P{iJ\SSRs) Stand for the posterior probability that / and j are ancestors of thpjndcx given the* 
^ information from the various SSRs. Let P(iJ) stand for the unconditional (or prior) probability of - 
~the same event and let P{SSRs\iJ)h£ the probability of observing the various SSR results if in 
~* • fact / and j are ancestors of the index.' Just as in Berry et al. (2002), BayesWule relates .these - . 
various probabilities: ^ ■ ' ' . 
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P(iJ\SSRs) = P{SSRs\ijTP(iJ) I Z[P(SSR*|w, v)*/>(«, v)] B 



«5u 



where the sum in the denominator is over all pairs of inbreds in the database, indexed by u and v. * 
We need to calculate £(SSRs\*J) for each i and jV^We will make the ^-prior-infonnatiim" m 
Assumption that P(ij) is. the same forldl pair^0"j).*Xhen P(m>v) is a*€Ojmmon multiple-ij&uthe *L 
denominator that cancels with P(iJ) in the numerator: - _ . ^ - . " 

P(iJ\SSRs) = P(SSRs\iJ)7UXSSRs\u,v>: ~"t. 

The problem is to calculate a typical P(SSRs\iJ), the probability of observing the index's ~SSRs 
assuming inbreds 7*&nd j are both ancestors. The nature of breeding before, the self-pollination ' 
process is unknown. Since the creation of an inbred proceeds by-multiple generations of self- 
pollination on a hybrid, wejabelthe (unknown) hybrid used to create, the (known) index inbred 
as the intermediate hybrid. When the intermediate hybrid is an immediate descendent,ofi>andy, 
it receives one of inbred f s alleles and : one of inbred f$ alleles. When the intermediate hybrid Is 
a second generation descendent of i and j 7 it receives one allele from each with probability 0.5r 
And so on. Since degree of ancestry (if any) is unknown, we label the actual probability of 
passing on one of these_^lleles>to the intermediate hybrid to be p. As in Berry et al (2002>-w€^ 
consider p - O.SOlmd/? = 0.99 and here we atso consider the intermediate value*/? = 0.75*1 



When inbreds -ijand j are ancestors then there are fotrf -possibilities: (j) the alleles of both i andy 
were passed to the intermediate hybrid, (2) i came through but not 7, (3)>came through but not i, ~» 
and (4) neither came through. Assuming independence, these have respective probabilities/? 2 , 
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p(\-p)> p{l-p), (l-?>) 2 - An allele in the intermediate hybrid's genotype that did not arise from 
either inbred / or inbred j is assumed to be selected with probability 1/w, where pj§ the total 
, number of alleles at the SSR in question. So far the steps we have des^ibed are identical to those 
^„ r * * for identifying the ancest©^ 

* _ is heterozygous at an SSR then calculations^proceed just as for hybrids^Calculatiojis are 

substantially different When the index inbred is homo2ygous, say genotype aa. Cases that must 
- * ^ be considered are shown in Table 1, where* is any allele different from a (but not missing). All 

- alleles other than a can be grouped because only a appears in the indexls genotype. For example, ^ 
xx might be be or bd or bb, ~. ~ 

" P(SSR\iJ) is the probability of observing the index assuming inbreds i andy are ancestors'The _ 

calculations for SSRs 1 to 6 are shown in Table 2, where the four terms in each case are hrprder. 

* . - 

T . of (1), (2), (3), (4) defined in the previous paragraph- Missing alleles are not considered iarthe 

.« " • - - " . : 

: examples above. The number of possibilities is large. Here we consider only the case m which 

& ~ - m ' . " „ 

inbred i is and both allelesjof inbred 7 are missing. Then L ~ ~^ 

P(SSR\iJ)=p 2 (l/2+i/2*]/n)+p(]-p)(!/2+!^ +p(l-p)(l/n) +ll r p)*(l/h)** 



"Another possibility not considered above is that more than two^alleles can be observed^ for an~SSR — 
^marker run-on individual DNA sample. This can be due to SSR locus:duplication> homology due to 
alloploidy, more than one individual plant being sampled for DNA extraction or cross-contamination. In 
this case we consider all possible pairings of the observed alleles and calculate using a multiple . 
imputation procedure (Little and Rubin, 1987). . 
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To find the overall P(jSSRs\iJ), multiply the individual P(SSR\ij) over the various SSRsrfEq 
determine the probability that any particular inbred^y inbred i t is the closest ancestor of the * 
-indexv sum P(SSR\i,v) over aft inbreds v witlfV *L Call this P(i \S3Rs).- The maximum of 
P(i\SSRf) for any inbred / is 17 But since there is one closest ancestor on*each side of the faF&^. 

the sum of P{i\SSR$) over all inbredstis 2. ^ ^ ~ „ ~ 

— — — . - ^ 2 

SSR data: Soybean DNA was extracted from 490 varieties, altfof which^were bred in,,and*ar$„. . 
adapted to, the United States. Plant material for DNA extraction was sampled from six plauts of 
each variety. Most of the varieties are proprietary products of Pioneer Hi-Bred International. ... ^ 
Several (non-patented) commercial varieties from other breeding companies and some important.. 
publicly bred varieties were also included. Procedures for obtaining SSR data from soybean were 
identical to those described for maize by Berry et ah (2002) apart from the following 

- modifications: PGR products with different size ranges and labgled with different fluorochromes 
were pooled and diluted 1:9 with capillary electrophoresis 'buffer (Applied Biosystems) then 1:4 
with dH20. l.Sulpf pooled DNA were added tolOul formamide* containing the molecular-weight - 
size standard 400HD ROX(Applied Biosystems, ROX = 6-carboxy-X-rhodamine). Fragment 

" separation was performed using capillary electrophoresis on an ABI3700 platform (Applied — 
Biosystems), with an injection time of 10 sec at 10,000 V and a ranjime of 4,000 set at 7,500 V. ~ 

~ Forty-three soybean varieties that had both of their parent varietie$*also included in the dataset- 
were assigned as index varieties. One to two and occasionally three grandparenwarieljes of 
several of the index varieties were also included in the dataset. These varieties collectively 
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represent a broad array of diversity of soybean germplasm that is currently grown in the United 
. States. 

. ■ » ** , -X 'f- - ' 

^ T^sahundred and thirty^$ix,publicly available soybean SS&'markej^ 4& «r.; 

* * "■ ' - 

(http1?/soybase.agronJastate;i©du/) were used to demonstrate and evaluate the. ajgorithm.-Theas ^ ~ 

- SSR markers were selected following initial screens on a subset of.24 soybean varieties in which - . * 

they were tested for amplification and the ability to detect polymorphism^-Tlie 236 markers gave . . - . 

good genome coverage and collectively mapped across each of the chromosomal linkage groups — ^ - 

"of soybean. *^ - " « . - * - 

-AH allele scores were made without knowing the "identities of the soybean genotypes. ... _ 

Maize SSR data using 70 loci were previously reported by Senior et.al: (1998) and were "obtained ^ 
directly from the first author. This publication (Senior et al 199S)dtes an array of 94 _ 
historically important publicly bred lines that have well known and well established pedigrees. - - 
This array of public inbreds includes seven inbreds (A632; A634,.M°L7, ?a91, Va55;-Va99 and ..... „ ' . . 
W64A) that each have SSR profiles for their parental lines included in the same datasefc Three* of 
«. these inbreds were developed from a breeding cross^of two unrelated parents. These arei Mo 1*7 _ 
which was bred from the 2foss of C.I. 187-2 x C103; Va99, which was bred from the cross ~ • . 
* -*~Oh07B x Pa91; and~W64A which was bred from the cross of WF9 xX.L 1 87-2. Qjherinbred ~ - 
progeny had more complex pedigrees. One inbred"(Va35)~was bred^from thg.cross C 103 x* T8 . « . - 
following ai^additional cross of T8 as the recurrent parent. Twojintareds (A632 and A634) were .' . t 
bred from the cross Mt42 x B14 following additional crosses of B14 as the recurrent parent. 
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Pa9l was bred from a complex cross involving four inbreds (WF9 x Oh40B) and (38-1 1 x 
L317). These seven progeny inbreds therefore provided an index set of maize inbreds for v , tv 
Revaluation of the inbred algorithm. ^ ^ 

. *%■ % m. 

~ --V ~~ ' . RESUt£S ' " - ~ 

* 7 _ -^c* 

«, « — 

Data quality: The soybean SSR data that-V£ere used to evaluate the algoi^lim had*a mean of . . 

5.5% (range 0-19% loci) missing data per variety. For parentnprogeny triplets", there was a mean . 
of 1.1% loci (range 0-5%) where a progeny profile was scored foran allele that was. not 
represented by either of the seed sources that represented the parents. The^maize SSR data ha^d a ^ 
mean of 0.7% missing data (only three genotypes had missing data; these w,ere at elevated levels^ 
of 5%, 9%, and 36%). A mean of 6.4% parent/progeny triplets (range 4-7%) had SSR progeny 
profiles that did not share.an allele with either of the seed.sources that were, available to^gpresent — ~ 
the original parental genotypes. ^ -u v - " r . 

Probability of ancestry applied to soybean data: Figures 1 and Z present the probabilities of 
closest ancestry of the top ranlcing varieties for each of 43 soybean varieties using data from 236" " 
-marker loci at/> = 0.50 {Fig l)-and at/? = 0.99 (Fig 2). ~ -. ._■ ■ . 

Whenthe algorithm was usedjit/* ^0.5 with data from all 236 loci (Fig 1), then-24/43 (56%) of 
index varieties had both parents cqrrectly identified jn the top twojanked-pog$ons, 12/43 (58%) 
had one parent correctly placedin-one of the top two positions, and 7/43 (16%) had non&of the , 
actual parents assigned into the top two ranked positions. Thus, when p = 0.5 was used, 60/86 ^ 
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(70%) of actual parental varieties were correctly ranked in the top two positions 'and 26/86 (30%) 
were incorrectly placed in lower positions. , - «g ^ 



^ When the algorithm was used at/? = 0.75%ith data from all 236 loci (data not skqwn);'28/43 ^ 

~ z (65%) of index varieties had-both parents correctly identified ill thejop^two ranketTpositiQns, -cL - 

•~ " *■ • 

~ " 1 1/43 (26%) had one parent correctly placed in one of the top two ppsitions r and 4/43 (9%) had 
«> , ^ ^ **** - 

-*- 1|B " , _ 

- - none of the actual-parents assigned into the top two ranked positionsijherefore, when 7 ^ ~ 

- * was used, 67/86 (78%) of the actual parentaljyarieties were correctly ranked in4he.to£ tw,o . 
positions and 1 9/86 (22%) were incorrectly placed in lower positions. . * ^ ^ , - 

When the algorithm was used at /= 0.99 with data from all 236 loci (Fig 2), then<33/43 (77%) of - - 

«* actual parental varieties were correctly ranked in the top two^positions and 40/86 (23%) ha4 one 

• ^ parent correctly placed; all index varieties had at least one parent ranked in the top Wo positions _ • - , 

* ... - , • 

~ when the algorithm was used at /? = 0.99. With p used at 0.99 then 76/86 (88°/p) of actual parental 

varieties were correctly -assigned; 10/86(12%) were incorrectly assigned. *, 

Table 3 presents the rankings, probabilities, and pedigrees of varieties that were incorrectly ; 
- assigned above a true parent. The largest pedigree-class (41% of cases..where a non-parent ranked 
above a true parent) of non-parents ranking higher than parents was for varieties that are . . . ~ * - . 

— _ derivatives of the parent that was-misplaced at_a Iow§r ranking. The-equal second largest classes*- - - „ 

^ «*(each representing 14% of the cases) were for varieties that wej;e w (a) r fiill sibs of the true-but - - ~ 

' misplaced parent a!tfd (b) full sibs of a grandparenf*of the variety forrwhich the pedigree was ' 
being tested. Other categories (percent of cases in parentheses) were: multiple backcross versions 
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of the misplaced parent (7%), a derivative of the variety or which the pedigred was being tested 
a half-sib of the true but lower ranked parest,(7%), a full sib of the variety for which the 
- pedigree was being tested (3%), and a half-sib ofthe variety for which the pedigree was being 
<T „ tested (3%). Insufficiently detailed pedigree Information is available to categorizejtfie variety 
_~ (3% of caies) that ranked above the true .parent -* ■ ~ ~ ^ ~ 

Robustness? 1 ^ quality of soybean SSR data as received frofn the laboratory, injemas of » 
missing data and apparently non-Mendelian parent.progeny triplets, have already been*presented^ 
Taking these data as : an initial starting point, additional levels of missing-and mis-typed data~ u ^ 
were created by simulations and used to explore robustness of the algorithm. 

SSR data for five index soybean varieties were usetTto determine the robustness of .the algorithm. 
Subsets of data were created that included parameters of reduced numbers of loci, additional™-, 
levels of missing data; additional levels of mis-typea 1 data, and various combinations of these 
T parameters. Simulated levels of missing and mis-typed data were created with a firstpass 

- " - creating missing data, followed by a second pass creating mis-typed data, Therefojg, for 

example, the maximum level of cumulative error from simulated missing and mis-typed data was 
from 36 to 40%, JFiVe varieties were chosen to represent a range of diversity in respect of both - 
— w pedigree and SSlf profiles. varieties had no parents or grandparents in c^nrrnon and one 

pair of varieties wa&Telated-by-a common parent All" varieties had parents j£nked*in:the top. two - _ 
positions*when the algoritiim was run at/? = 0 J5 and-p = 0.99JThis selection of varieties 
therefore provides a means to establish lower boundaries foivBoth the quantity and quarfty of 
SSR data that are required to avoid aberrant results. 
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Table 4 presents the probability of ancestry of the top five ranked varietiei for each of five ; 
selected soybean index varieties (93B1 1, -A7986, P9443, S38T8 and Young] when the algorithm 
is run using differ^iLnunibers of SSR marker k>ci (50, lOO.Ibfifi and 236) at ea&feof two levels of jgr" 
p (0,5 and 0.99). Using p = 0.5, the lowest^ercentage of parents (60%) thaUg^re correctly" - ^ 
ranked into the top two positions corresponded to using only 50 SSR. Increasing^be-number of , . — - 
— * " , loci to 100 or-l50.of236 increased the ability to identify the actual parents to ahojtf 90%fWhen 
«»■ . * p was used at a level of 0.99 all parents were correctly ranked into the top two positions for each 

of the five Varieties when^data from as few as 50 SSR loci were used. - ^ " . "~ 

' .r m 

. . -»- 

- ~ 1 

Table 5 summarizes other aspects of robustness. PJamely, we simulated additional levels of 
" missing, mis-typed and missing plus mis-typed data, -beyond those that were inherent in the data 
* as provided by the laboratory. When p was used at-a level of 0.5, robustness was generally 

maintained up to an additional level of 20% simulated missing data, so long as 4? ta fr°m 100 or.. /+ — 
more loci were used. Similarly, robustness was maintained for up to 20% additional mis-typed 
* " data so long as data from 100 or more* loci were used.- Likewise, robustness was maintained with . . 

up to 18 to 20% additional levels of data error including both missing and mis-typed data, so. . 
. - long as data from 1 50 or more loci were used. Using data for all 236 loci pro&idedja higher level . i 
of robustness, but even then robustness collapsed when 36 to 40% cumulative additional error, i*- 
_ ' ~~ from missing juid mistyped data were -simulated irftotihe analysis. The overall4&veljj£ correct . , — 

-assignation of parent varieties was higher when £was used* at a level'of 0.99. All parents-then^ *: 
* were correctly identified, even when data from only 50 loci were used up to an additional level 

of 10% . missing data. When data from 100 or more loci were used then all parents were correctly 
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identified with up to 20% additional missing data. Robustness started to decline^when the 
. ^ algorithm was applied with 10% additional mis-typed data when data frojn 150 or fewer SSR 

•r*r- ~ loci were used. HoweverJ-robustness was maintained for up to 20% additional mis-typed data 
*«yhen data from 236 §§R loci were usedgWKen additional levels^ffbbth incorrect d,ata were 
— applied then robustness waSJnaintained at lcvelsl>f up to 10%*missing i gjus 10% mis-typeckiata 
so long as data from at least 150 SSR loci we^ used. Robustness was compromised j$hen 
additi'0nal*eirnulations of 20% missing plus .2-0% mis-typed*data were applfcd even when data 
from all 236 SSR loci were used.-- -at. _ « 

We then investigated the relationships of varieties to the index genotype whose pedigree was - 
* under examination by rerunning the" analysis after both parents of the index genotype had been • 
removed from the analysis. Fifteen varieties that had two or more of their grandparents.profiled- 
in the dataset were used for this examination. After removing parents; direct pedigreed 
derivatives of thejndex genotype ranked first for,P9583, in the first three places For A2943 and 
* ■* ^ in the first six places for P956 1 . Once all parents and derivatives of the in,dex genotype had been 

* removed from the analysis then the following results were obtained. Predominant cjasses of 

varieties ranking in the top five positions were (percent of cases in parentheses): derivatives of : 
the grandparent &f the index variety (32%), grandparents-of the index variety (16%^ derivatives- 
of the parents of the index variety ( 1 6%); and Kalf-sibs of the index variety ( 1 3%)7Grandparents 
— , ranked among the first four positionslfor 10 varieties and werejn the first placS*for five^arieties? 

"Great-grandparents rankedjwithin the -first seVen places for three varieties, and d-greaUgteat- 
grandparent ranked in eighffrplace for one varietyrOther varieties that ranked in the first plao% - 
were usually closely related to the variety whose pedigree was under examination; ful!-sibs and 
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half-sibs were the predominant classes of relatives other than grandparent* in the first ranking re - 
position after parents and direct derivatives of the variety under examination had been removed. ^ 



^Probability of ancestSy applied to corn data: The seven index inbreds of^rjaize were«$elected 

_ ■ * v 

becauselhey.rerjTresented all of4he inbred*lines published upon by SeBfor-ei^ (1998) that had 

all of their inbred parent&alsoincluded inlhe SSR~data$et. All of the inbred lines published- by • _ 

«■ ^ ' ' _ ~ - 

r — -Senior et al (1998) have well kno\vn and well established pedigrees that^xCfiilly provided by-—*/; 

*-• * _ 

those authors. ^ . ^ ^ 

Table 6 presents probabilities of ancestry for the top five ranked inbreds for,each~of the seven 
index inbred lines at two levels of p (0.5 and 0.99). For the three progeny that^tre bred from . -* i 
single crosses without any subsequent use of one of the parentsto make a recurrent cross prior to 
inbreeding (Mol7, Va99, and W64A) then use of the algorithm at either p~ 0.5 or at p ^ 099 . . 
resulted in the parental inbreds being ranked in first and second positions. *Use of the.algbrithm at 
p « 0.99 provided greater discrimination for probabilities of ancestry that y^pre assigned to actual 
parents compared to highest ranking non-parentsl This was most noticeable for the case of- inbred 
Va99 which had a relatively low value when used at/? = 0.5 for parent 2 (0.5221) compared to 
parent 1 (0.9999) or to the third ranked inbred (and non-parent), Va22 (0.4252). In contrast, 
when the program*Was run sxp = 0.99 then parent 1 -and parent2 for Va99 had-projbabilities of 1 
^ and 0.9855, respectively, with the probability^ th^ - — -r» frfm . 

For each of the thfee progeny inbreds that originated from breeding schemes thatjpvolved one or 

* "* 
more additional crosses of one of their parents, using the algorithm at p = 0.5 resulted in - 
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placement of the respective recurrent parent with the highest probability of ancestry. Raising the 
level ofp to 0.99 resulted in both parents~(B1 4 =» rccurrenti>arent and MT42 the non-recurr^t 
parent) of the index inbred A632 being ranked in the top two places. Using this level q£ p also 

* ... _ * 

caused a high© ranking (third positioner the non-recurrent pajg^^MT42) of index inbred ^ 
*A634. Use ofp at 0.99itiid*not cause the non-recurrent parent(Cl 03) of index inbredi(Va35) to 
ranklhto the top fiye places. ^* . m- 

For the index inbred (Pa91) that was bred from a more complex cross involving fbm inbred -,. >v 
lines, the use of/? at 0.5 or at 0.99 resulted injhe two parents (WF9 and Qh40B) being ranked in 
second and third places ^highest ranked was inbred Va99 (Va99 is derived from the index inbred 

Pa91). Neither of the twq remaining parents of Pa9l ranked irt the top five places. 

**..■ 

DISCUSSION 



The current widely used North American soybean varieties "are founded-upon a relatively narrow . 
genetic base of diversity, Gizlice et aL (1994) document that the U. S. soybean^germplasm base 
is founded upon 20 plant introductions and that subsequent breeding has made repeated use of 
related parents. . Molecular marker comparisons of elite U. S. soybean varieties compared to. a 
sample of exotic varietiesjStRforce the conclusion that there is a relative.paucity of genetic ~ 

. variation in U. S. soybeans; Narvel et aL (2000) have shown that the number of affbles detected- 

' llMr . * » *" 

among the exotics was 30% greater-thanamong U. S. varieties. Thompson and-^elson (1998)- ; 

rf -.- . ^ ^ 

' ^ report that very little exotic^ermplasm has been incorporated into the existing U. S3oytfean 

germplasTn base. Examining all pairs of pedigree relationships among the 490 soybean varieties 
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employed in this study showed ttfat approximately 50% of pairwise relationships are related at 

the level of half-sib or closer; approximately 10% of pair? are related at the level of full-sib or 

qloser. This set of soybean varieties therefore provides the basisifgr an extremely rigorous#u 

evaluation oMie ability of S€R data to distinguish between yarieties and of thi^lgorithm to & 4ff 

• . - _ ... *~ >. ^ r ? 

* identify^pedigrees. Pedigree breedingrtncluding the use of relateApgrents, is also -commonly- .#* -s^ ' 
_ Applied in the breeding of maize inbreld lines. The set of maize inbrfeds .used here thus- also ^. ~* 

provides a meaningful evaluation of the marker data to discriminats^among inbred lines and of . 

... the joint ability of the algorithm and of the marker data to allow a determination of inbred - ^ 

- - > " 

% pedigrees. - * ^ - " ' 



»TJse of the algorithm at> = 0.99 rather than at a lower level improved .performance in terms of. 
the percentage of correct assignations of parents and provided a greater statistical differential. for 
probabilities for parents in comparison to the highest ranking non-parents. Usejof the algorithm* 
at p - 0.99 is more appropriate when it is known that the actual parents jof the. variety under 
* examination are included airing the set of index varieties. If iHs not jcnown that the parents. t arc -? 
included in the index set then use-of the algorithm at p = 0.5 is moroijustified (Berry et al 2002>); 
For the soybean varieties, when /?*was used at 0.99, then 77% of all varieties that were queried - - . 
— -** . for their parents had both parents correctly identified. Eight-eight percent ofjsoybean parents^ - 
were correctly identified across 43 index varieties that were queried for their parents: AH 
- yarieties (with the possible excepTion of 6he*variety where detailed pedij££e information was riot 
^ . available)lhaf ranked above true parents were related either' to the mis-ranked^paijent or to the* 
^ variety that wadMfeing queried for its pedigree.'Our previous report of the us©rB^an algorithm to 

determine hybrid pedigrees (Berry et aL 2002) showed a higher level of correct parental 
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determinations at p - 0.99. Many of these soybean varieties have a* high degree of pedigree .< 
rdatedness. However, many of thrmaize inbred lines that were used In the previously reported 
study (Berry et aL 2002) were also highly related. ItHs, however, likely to be%lierently rao^e 
^ challenging*To correctly identify parents follov^kg cycles of inbreeding because *halfof the _ **** 
m alleles that are segregating in the first generation following th^initial breeding cross will be_. 

subsequentlyjffst as recurring cycles of self-fertilization occur^Thus* many ofcthe alleles that are^ 
~ present in a hybrid, and which carftfherefore contribute to the identification^ its pedigree, do nofce 
"remain present in an inbred homozygous progeny. . ; 

We examined the pedigrees olsoybean index varieties when both parents of the index had been 
removed from the set of candidate varieties. Direct pedigree descendents wilh-the index variety 
as one parent then usually ranked higher than other varieties, including varieties .that were" * ' 
grandparents or sister varieties of the index variety. When all parents and direct-derivatives of. the 
index variety were excluded from the analysis then the predominant classes of varieties fanking 
in the top five positions were derivatives of the grandparent of the index variety (32%), 
grandparents of the index variety (16%), -derivatives of the parents of the index variety (16%), 
and half-sibs of the index variety (13%). The SSR data that were available to us did not allow a 
~ ' ~ thorough or very precise assessmeriT of how varieties with different degrees of relatedness would;; ; 

rank as members of the pedigree in the event that the true parents were not present in the . * 

dataBESe. Nonetheless, -when parents weffe excluded firom-the analysis then vafifetieS'that were^ • 
~' * -very clpsely relaied-to thenridex variety ranked highest. Direct descendents. dependent for thsir- * 
- pedigree upon the index variety, if-pres'dit, 'tended to rise above varieties included within other 
" classes of pedigree relationship to the index variety. When varieties directly descended by 
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pedigree from the index variety were also excluded then a grandparent ranked into first position 
for 33% of the varieties that were examined: Direct pedigree derivatives of one*or more of the 
parents of the index variety had an equal level of occurrence when<parents and denvaJiVes of the \. - ■* 
index variety were excluded. Further mueftigations of the idtflSfification of grandparents iSfill * 
* -require a datas^f including all grandp&rents of each index variety and will also^quire a revised 
t algorithm to take accouflf of pedigree cgntributions fronffour varieties a|.opposed toffcrirs of" ^ 
varieties which forms tHe basis of the current inbred algorithm. — ^, * 

For the maize inbred line pedigrees, use of the algorithm eitherat p - 0,5 or.at p- 0.99 resulted ^ 
* * in the correct identification of both parents in all cases where the breeding scheme ms#s an initial. w 
. cross of two parental lines followed &y subsequent cycles of inbreeding (i.e. fgu the inbreds" ^ 
Mo l l y Va99 and W64A), The relatively high level of robustness for results with jnaize inbreds "at ~ 
p = 0.5; in contrast to the results obtained from analyzingsoybean data (where 56% of varieties 
had both parents correctly identified*Wheii p = 0.5 was used) could he accounted for by the*» 
smaller sample size of maize inbredsand by the lower degree of mean, pedigree relatedness 
amongst this selection of inbred'linesin comparison to the soybean varieties. Thuswhile several . . 
inbred lines in this set are closel ^related, there remain many inbreds that have little or no — . 

pedigree relationship (Sent&gtaLl 998). - - — ."^ 

~ " The inbred algorithm correctly identified both parents of the three maize -mdeKsinbreds that hM- 

been bred from bi-parentW crosses tKat involved equal contributions (b^pedigree) from both ^ - 

m parents. For the three bi-parental crosses thatinvolved subsequent additional crosses of the 
recurrent parent (and thus significantly biased contributions by pedigree to the index variety 
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from the recurrent parent) then use of the algorithm correctly identified each of the recurrent* 
parents. The algorithm was unable to identify the non-recurrent parent in most cases, but this " ^ 
result would be expected because one backcross*ffeduces the expected pedigree contribution of 
the non-recurrent inbred to*25%! More genei&tions ofbackcros&ing using the.fecftorent parggt * ^ 
• then further reduce th&expected pedigree^ontribution of the^on-recijrrent parent bjjiialf at each ^ , * 

"* generation (successively to 12.5%, 6.25%, 3.4*25%) with the pedigree contgkution of the _ ^ 

recurrent parent rising accordingly. Since several inbred lines of maize Qjj&*related by pedigree — 

then it is not surprising that the level of pedigree or SSR similarity of a non-recurrent parent to^ . _ 
the index progeny can fall below other inbred lines that are related to, the index.variety. The .- iJV * ^ 
'~ . - algorithm was not able to preferentially identify parents of the inbred line Pa91 , which was bred " . - 
from a complex breeding scheme involving four parents with equal contributions by pedigree. A 
more suitable algorithm is needed to take account of four way crosses., However, such a need is 
primarily academic because most breeding crosses'in commercial maize breeding, and indeed for . ..." 
most crops, are bi-parental. * " -w. — 

••• * 
* These soybean data had a mean of 5.5% missing data per variety and a mean. of J. 1% loci where^~ 

a progeny was scored with an allele that was not also, scored in either or both parents. Such — . " ■ - ■ fc - 
- -apparent non-Mendelian orexclusionary profiles can be due to pollen contaminatioihduring ~ — . 

^ inbreeding, cross contamination in the field orTabpratory, scoring errors in the . laboratory (e.g. . 

, ~~ 

scoring +A, predominant stuttering, spectral pull-up, secondary binding si tes^on polymer spikes)^ ~* - — • 

\pr incorrect pedigrees. Another source STg&pparent exclusion is'through the-use ofj^seed source" — — : ^ 

as a parenf that is still heterogeneous due to inbreeding being incomplete; Cycles of inbreeding 

then continue so that when those seed sources are used in the future as sources for SSR profiling 
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to represent the parental genotype they will have lost alleles due to inbreeding that have already 
been passed on to a progeny. Alternately, residual heterozygosity within seed sources can result t 
in low^freiquencies of heterozygotes*or off-type segreggnts which may, b&chance 5 .be sampled in- 
the progeny* but not sampled in the parent. ^Cthis study we sampledi&k* plants to reprgasat-the 

-variety which may tef insufficient to captuffe-aileles existing at low frequencies within^ the .seed 

— - * * * «- q - 

source. And even if the allele"was sampled, it may not have been-detectedSfoJJowing PGR ~ 

» amplification due to predominance of the most frequent allele and allelic CQigpetitioneffects. _ 

- ^ -Hall (2002) has also reported- the occurrence of apparent non-parental SSR r alleles. Mutatiotfucan 

also affect SSR pro files. -Vig° ur( >ux el al (2002) have estiniated mutation rates oOJ-XrlO" 4 per 

generation for dinucleotide SSRs and an upper 95% confidence limit o£5.1 x 10' 5 for SSRs-with 

- longer repeat units. A level of error ordiscrepancy in expected SSR profiles.are thus inevitable 

for some, if not all crop plants. We therefore evaluated the robustness of the algorithm and 

dataset by rerunning the algorithm using datasets that were simulated to. have up to 20%. - 

— additional levels of missing plus 20% mis-typed data beyond the level that;was received fromithe 

laboratorVvTKe algorithm maintained its initial level otrobustness with up to anladditional level 

.of 10% both missing and mis-type&data, provided data from afleast 1 00 SSR locjuwere used. 

Fewer loci (60) were capable of retaining this degree of robustness in the evaluation ofjthe 

hybrid pedigree algorithm using^haize hybrids (Berry et ah 2002). The loss of parental alleles 

that occurs during the inbreedingprocess, in contrast to their retention in a. hybrid progeny^ 

compared to its parents, probably underlies the ri^ed to use datalrom*a greater number of loci. to - 

maintain robustness for the inbred algorithm as compared to the„hybjnd*algoritfim. ^ 
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It was anticipated that determination of pedigrees following cycles of inbreeding might be more 
challenging to accomplish than to determine pedigrees of hybrids wherWhe tgtal nuclear genetic 
contributions^ both parents are preserved.^Nonethe!ess, these results show thatlhe algorithm, . 
" ^ can be used effe&ively to identifying parents ©i^nbred genotypes. Mear^y 90% o&SQybean 
^ • * parents were identified. This is a setof genotypes which, due tojth&relatively narrow founder^ 
base and subsequent cycles of development through the use o£related crosses, provides ari 
- extremely rigorous test of the*algorithm and of ^discriminatory power of the marker Jata. 

Supplementary data also show the capability of the algorithm to identify parents of maize inbreds 
- ? that have been developed in a pedigree system using two parents.'Dse o£this algorithm with ;„ . 
currently available cpdominantly expressed molecular marker data has al$o. been shown to have 
. * practical feasibility because of the high degree of robustness that is "evident and whichTextgnds 
well beyond the realm of aberrant or unexpected marker data that is encountered. These types of 
error or unexpected marker data can include laboratory error, sampling effects of the use of 
different seed sources for the actual parental source compeared to a more inbrj&Lsourqe that 
becomes available later to represent the parental genotype. This algorithm ha%application in a^ 
number of fields, including conservation biology, population genetics,^ and to assist in the 
protection of intellectual property rights. • - • 
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Table L Calculations of ancestry for homozygous index inbreds: Cases that must be considered 
for example of genotype aa. '* 



SSR - 1 


Index 


Inbred i ' 


Jnbred j 


1' 




aa ~* 


Aa 


2 , . - 


off 


aa 


Ax , - m 


3 


aa 


aa - 


Xx 


4 


aa 


ax 


Ax : 


5 


aa . 


ax 


Xx 


6 


aa 


xx^ 


Xx 



x is any allele different from a, but not missing 
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Table 2. Probability of observing the index [P(SSR\iJ)] assuming inbreds i and j are ancestors: 



Calculations for SSRs 1 to 6. 



SSR ~P(SSR]i.j) -5 ' ~- 


1. 


p J (4/4) + p(l-p)(l/2+I/n*l/2) + p(l-p)fl/2+l/n*l/.2j + (l-p/(l/n)_. 


2 


^(3/4) + p(l-p)(l/2+l/n*l/2) + pfl-p)(l/2*l/2+l/n*l/2) -P (V- 

.^Vi/«; ' ' • ~ - - " " . - • 


3 


pVM; + p(l-p)(l/2+l/n*l/2) + p(l-p)(l/n *l/2} + (I-p/(lM) " 


4 


p J (2/4) + p(I-p)(l/2*l/2+l/n*J/2) + p(l-p)(l/2*t/2 + J/n*l/2). + (7- 


5 


/(7/<jr> +p(l-p)(l/2*I/2+l/n*l/2)+p(l-p)(l/n*l/2) + (J-pffl/n) . 


6 


+ p(l-p)(J/n*l/2) + p(l-p)(l/n*l/2) + (l-p/(l/n) ..' ; 



The four terms in each case are in order of the four possibilities wjien^inbreds.i and-./ are. 
ancestors: (1) the alleles of both i and j were passed to the intermediate hybrid, (2) ^came 
** through but not j, (3) j came through but not /, and (4) neither came through.. Missing alleles are* 
not considered. 
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Table 3. Probabilities of ancestry and pedigree relationships for soybean varieties where both 



- parents did not raiik^above non-parents. 
Case no. Index variety Rank 



Ptfssiblc ancestor 



Probability 



1 



95B97 



A2943 



A4595 



Hark 



Kenr 



P9583 



- P9641 



S30J2 



1 


Parent 2 


1^ 


- 2 


Full'sib of parent 1 


-0.5822 


3 


irarent i ■ ^ 


0 4124 - 


- 1 


~* '% 
Parent ! ^ 


-0.9977- 


2 


Mul tiple backcross of parent 2 


0.7999 


3 


Parent 2 


0.1999 


1 


Parent 1 


1. 


.2 


Derivative of parent 2 - ^ 


0.9956 


3 


Multiple backcross of parent 2 


0.0034 


4 


Derivative of Parent 2 


0.0006 


5 


Half sib of A4595 


0.0004 


O 


rarent z 


0 0001- 

V-VVV 1 


1 


Parent 1 - 


1 " 


2 


Derivative of parent 2 


1 


3 


Derivative of parent 2- 


2.1E-09 


. 4 


Derivative oi parent z 


I.tJU. U7 




Derivative of Hark 


3.1E-10 


6 


Derivative of parent T - 


1.1E-13 


7 


unknown 


3.8E-15 


o 
o 


juenvanve oi parent j.^ 


4 6E-1 7 


9 


Derivative of parent 2 


4.7E-21 


' 10 


Parent 2 


2I7E-21 . 


1 


Doi>ont O 

r^arem z 


* 1 


2 


Derivative of parent 1 


- 0.9990 


, - 3 


Derivative of parent 1 . 


0,0011 


..4 


Parent 1 


3.0E-04 


1 


Parent 1 


1 


"2 


_Full sib of P9583 ^ 


0.8801 


. 3 


Parent 2 




1 


Parent 2 * 


-l 


2 


Derivative of P9641 


l 


*3 


Parent I 


3.7E-06 


1 


Parent 1 


1 


2 


Derivative of parent 2 


0.9321 
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7 
.J 


Pnrf*nt 9 

X Cll CI LL 


0.0679 




* » y 


YR IGKf) I 


1 
1 


Parent 1 










? 


Half sib o f Daren t 1* 


l 


- - 






3 


Full sib of parent 2 ^ 


7.9E-09 






**» 


4 


Half sibnof parent 2 


.3.3E-09 










Full sib nfVrandnarent 

-A l+ll oils Wi Jjl«UV^/<*l vi iv 


1.2E-10 






•5!- 


o 


J^ciivauvc UL pal cm i . ^ 


~3.0E-M - 








" 7 




- 2.0E-1 1 * 








9 
O 




" 8.7E-12 








9 


Parent 1 * ~ _ 


■ 1.1E-12* 


- 


10 


YB41Q01 


1 


Parent*^ ^ 


1 








2 


Full sib of parent 1 * 


1 








3 


Full sib of grandparent 


7.3E-P5 


■r 






4 


Full sib of grandparent 


4.1E-09 ' 








- 5 


Parent 1 


• 971E-10 - 


— ■ e 



Results for 33 (77%) varieties where both parents were ranked first and second <lre not'includcd 
in this table (see Figures 1 and 2). - " - ~ 
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Table 4. Probability of ancestry for five individual soybean; varieties using^SSR data obtained 
% from different numbers of loci (50, 100, 150, 236). ' ^ 





Inbred 


L50 


LIO0 




H50 




L236 r ; • 


Possible ancestor Prob Possible ancestor Prob 


r oss i Die ancesior 




* P/\cctfi1f* onrpQtnr PrrVh ^nQki : 




- ; P=0.5 








^ 








935// 


XB3IC 


""0.9461 XB3IC 


1 


XBJtC 


1 


XB3lC^ ^1** ^ - 




















" .0,-8006 A3i(5 _ 


0.9362 


A34I5 


.0,9146 


A3415 " -*JQ.9954 ^ 






XB38AQI 


0.0256 WILLIAMS 


^0.0429- 


willTams 


0.0809 


WILLIAMS' ' 0.0046 * 






P927I 


0.0251 A3242 


~0.0155 


~ YB30L0I 


0.0034 


>43242 " ~ 0** - ^ 






YB30L01 \ 


- 0,0232 EB30L0i 


0.0015 


A3242 


o.oood 


DOUGLAS ' 0 * 
■* — 




- A7986 




0.774S JfRAXTON 


0.9725 


BRAXTON 


1 


BRAXTON - „1 " . 






XB63D00 


0.2841 YOUNG 


0.5302 


YOUNG 


0.8910 


y&LWG 0.9929^ 






S6262 


. 0.1826 COOtf^ 


0.3872 


P9641 


0.0404 


XB63D00 0.0071 - - 






YOUNG 


-.0.1755 XB63DQ0 


0.0496- 


XB63D00 


0.0254 








BRAXTON 


- ~0.1065 pp^y 


0.0328- 


COOK 


0.0245 


P964J" *> 0 




P9443 


DOUGLAS 


. 0.S036 ^J4/J 


0.5557 


FAYETTE 


0.87.60 


FAYETTE"*' 0.9885 


- 




A34I5 


0.7629 FAYETTE 


0.4957- 


A3415 ■ 


0.7034 


DOUGLAS 0.8847 


- 




WILLIAMS 


0.0887 DOUGLAS 


0.4855 


CX399 


0-1671 


<4$4/£ 0.0846. 






YALE 


0.0501 CX260C 


0.2032 


CX260C 


0.1273 


WILLIAMS 0:0348- *" . 


* * 




P9394 „ 


*0.0411 WILLIAMS 


0.1 60S 


WILLIAMS 


0.0948 


CX399 0.0062" 




S38T8 


S3533 


0.8711 


0.9993 


' S3533 


1 


1 






S4644 


0.4543 S</6"44 


0.9988* 


S4644 


1 








YB44R0I 


0.2762 YB40M0I 


0.00 12 


YB37Y0O 


0 


^26i 0 ~- ~ . _ 






YB40M01 


0.1087 YB44R0I . 


0.0004 


93B65 


0 * . " 


- .0 — - 






YB44Q01 - 


0.0325 YB37Y00 


• 0.0001 


A 4268 


0 


yb&tyqq o ^ - 




YOUNG 


DAMS 


0.6589 € . 


0.6551 


' DAVIS 


0.6324 


ZW/S - 1T9752 - ^ 






XB63DO0 


0,4942 fXSET * 


0.5979 


P964I 


0.5524 


, 0.5397 ^ . 






96B32 


0.3122 PP<W7 


0.3409 


COOK 


0.3231 


£55£A*~ 0.3273 
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COOK 


0.0707 COOK 


0.1692 


ESSEX 


0.2817 


96B32 


0.1299 


- 


QGDEN 


- 0.0606 96B32 


,0.1315 


96B32 


0.1933 


COOK 


0.0235 


p=0.99 
















'93611 


XB3IC 


r 


XB3IC 


1 


XB3IC 


1 


" .-,"1 


AW 5 




0.9^99 


A34I5 


1 " 


A34IS 


1 ft 


«•* ' 


A3242 


0.0001 1*32*? ^ 

0 * 


~@*0001 


^P9443 


0 


- WILLIAMS* 






P9443 _ 


0 


- A3242 ^ 


"6- 


A3242 


jd. 




WILLIAMS 


0 _ WILLIAMS 


0 


WILLIAMS 


0 


"FAYETTE. - 


0 


A7986 


BRAXTON 


1 BRAXTON 


1 


^ BRAXTON _ . 


1 


BRAXTON 


1 _ 




YOUNG 


0.9903 YOUNG- 


0.9903 




0.9987 


YOUNG J 


1 - 




P9641 


0.0092 m*/^ 


0.0O92 


- 96B32 


0.00 12 


XB63D0O 


0 




96B32 


0.0005 P<#32 


'0.0005 


P964I 


0.0002 


96B32 


. 0 


- 


DAVIS „ 


0 zw/s 


- '0 




0 


P9641 < 


0 




DOUGLAS 


0.9998 DOUGLAS 

T" 


0.9999 


FAYETTE 


.0.9995 


DOUGLAS 


r 


v 


FA YETTB 


0.7010 FAYETTE 


0.7011 


DOUGLAS 


0.9993 


FAYETTE 


1 . 




CX260C 


0.2345 GTCtfOC 


* 0.2345 


CX399 


0.0006 


CX260C 


0 




A34I5 


0.0644 A34I5 * - 


0.0643 


A34I5' . 


0.0005 


CX399 


, 0- * 




S394I 


0.0001 ^WJi^? 


- 0.0001- 


PP3P4 


0.0001 


A34I5- 


0 




S3J33 


1 S3J3J 


* 1 


^3333 


1 


S3 335 


1 




■ S4644 


I »rf« 


1 


S4644 


1 


S4644 


: 1 


— 


YB40M0I 


0 YB40M0I 


6 


93B67 


0 


A4268 


~0 - 




Y344RQI 


0 ,45P70 " 




ST3780 


0 


-YB54J00 


J 




93B67 


0 YB44R0I 


0 


YB37Y00 


0 


YB44R0I 


0 ' 




DAVIS 


1 * ZMF/iS 


1 


'DAVIS 


•I* 


DAVIS _ . 


1 




ESSEX 






ESSEX 


1 




- 1 




P9641 


0 />9<W - -6" 


~ COOK 


0 




0 



©069 
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ABSTRACT " " - - _ _ 

Determination of parentage U Fundamental to the study of biology and coMipplications such as then , 
identification of pedigrees. Limitations to stud iw of parentage have stemmed from the use oT an insufficient 
number of hypervariable loci and- mismatches of alleles that can" be caused by mutation or by laboratory 
error and that can generate false-exclusions. Fu run ermore, most studies of parentage have been limited - 
to comparisons of small numbers of specific parent-progeny triplets thereby precluding large-scale surveys 
of candidates -where there may be^no prior knowledge of parentage. We present *an algorithm thaL can 
determine probability of parentage in circumstances where there is no^prior knowledge of pedigree and'*- 
that is robust in the face of missing daca*or mistyped data. Wc present data from 54 maize hybrids and 
566 maize iubrecU that were profiled using 195 SSR loci including simulations of additional levels of ,u*— 
missing and mistyped data to demonstrate the utility and flexibility of this algorithm. 



DETERMINATION of parentage h fundamental to 
"the study of reproductive and behavioral biology. 
The increasing availability oF highly discriminant ge ; 
nctic markers for many diverse species provides the 
potential to uniquely characterize individuals at numer- 
ous loci and to unambiguously resolve parentage where 
genealogical relationships are unknown, in error, or in 
dispute, 

identification of parent-progeny relationships in wild 
populations of animals and plants provides insights into 

r the success" of various reproductive strategies- (Ell? 
strand 4984; Smouse and Meagher 1994; Alderson- 
et*aL 1999) and has allowed For the impjementation 

> "of management programs to conserve genetic diversity 
(Miller 1975; Ramnai-\ and Mountain 199") /The 
association of pedigree with physical appearance or per- 
formance in domesticated animals. and plants allows 
parents that have contributed Favorable alleles For desir- 
able traits through selective breeding- programs to be 
identified {Bowers and Milredtth 1997; Sefc et aL 
1998; Vankan and Faddy 1999). These applications of 

^ associative genetics facilitate Rtrthcr pro grcsrin "genetic 
improvement through breeding ~Xstabl is hment, of par- 
entage is also usefuAo secure legal rights of guardian- 
ship iiWiumans, to help protect intellectual property in 
plant -varieties.* "to validate breed pedigrees of domesti- 
cated animals, to protect stocks of hsh, and to identify 
provenance of meat that is available in supermarkets 



1 Cutrrsptmditig author: Dtrpannn.ni i>l BiosiiuiMic.-i, The Lnivcmiv ol 
Texas M. D. Awlercon Cancer (Vruer. 1"»I3 Hoi com be Blvd.. 
4-17. Houston. TX 7 TlWO-H ><»'_). K-niall: illwnryitoiiduiulci^jn.on; 
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(Cotz and Thaller 1998: Primmer et cU. 2000;- White 
~ .et aL 2000). - * ■ ** 

Most studies'of pedigree have unlived exclusionariaty- 
sis wherethe molecular marker genotypes of either one - 
* * or a restricted number of.poteiitiai triplets of •offspring 
and putative parents are compared. Often the krenufy 
of the mother is not in question; "the maternal profile," 
is subtracted from that of the offspring and the deduced 
paternal profile is then compared with' candidate father 
genotypes ^Eli-strand 1984; Hamrick. and Scmnabgl ~ 
19S5). Individuals who could not have contributed the ^ 
paternaLgenotype are excluded; the remainder are pos- ~ 
$ib!e parent?. No n pate mi tv in humans is generally de- 
clared only on the basis of exclusions exhibited by at 
least* two unlinked and independent loci. This criterion 
* of exclusion reduces ihe likelihood of a false-declaration T 
of nonpatefhity-on„the basis of 'murker results that- are* 
actually due to mutation within -the phytogcny.»BEi£* et 
ol (1998) show that evidence oF-nonpatcnuty^shouId 
.require exclusions at loci on different chromosomes to*"^ 
avoid erroneous conclusions *h at would be rnacje due*- - * 
to nondisjunction at meiouis leading contmparejxtaTnv _ 
heritance. A*requicement For at least-three independent 
exclusions to declaim nonpaje rnfey- in humans has als«i 
w bcen institutcdT'GUNN^w-ifiL I997K In stuclies'Sf natural , 
^^prjpulations of animals^r p1artt$*where numerous 
ent-progeny uiplcts arSexamined it is usual to accept 
*a single exclusionary event as evidence oF nonpaternity 
^(Marshall tt nl. 1998). Futeniky.. testing has been ex- 
tended to situations where DMA from either parent is 
unavailable, For example, paternity can still be estab- 
lished in circumstances where the putative Father is de- 
Ceased but his parents arc still alive (Hki.minen et aL 
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1001; Bockki, W 100*21. Chakrakortv et (10*14) 
deniunMrau: that paunihv ean be deter mined in cases 
where the mother is unavailable tbr testing.. L^C et 
ui (100:3) partially reconstructed the DN'A profile of a 
missing crocodile parent using profiles of the mother 
and progeny. 

CHAKKAftOK-i-Y w at. { and SMOUSifuancl Mkachkk 
( I904)*report that reliance upon exdusibn alone: has 
usually failed to unambiguously resolve paternity. Limi- 
tations have* stemmed from "the use. of *arc insufficient 
number of" independent hypervariable loci. Other static _ 
"tical methods ace-therefore required to calculate the 
lijCcliriood of paternity" for each npncxcluded*malc - 
(Berry and Gejsser. 1986; Meagher 1986; Meagher 
and Thompson 1986; Thompson an (1/ Meagher 19S7; 
Devt,in ttal. 1933; BtftftY 1991). Marshall et aL (1998) 
draw attention to the quality of data that is encountered 
praedcally in genotypic surveys. Maternal genetic data 
may or'may not- be available, data may be absent for 
some candidate males, data may bc-missigg for some 
loci in some individuals, null alleles exist, and typing 

"errors occur. Reconstructing or validating the pedigrees 
of varieties of cultivated plants often provides additional 
challenges because their phytogenies can reveal appar- 

. ent exclusions that masquerade as non-Mcndelian in- 
heritance. For example, apparent exclusions can occur 
in circumstances where an individual is used as a parent 
prior to completion of the inbreeding process. The de- 
velopment of parent and progeny then continue on 
parallel but separate tracks thereby allowing the possibil- 
ity that alleles that are subsequently lost through in- 
breeding in the parent can still become fixed in the 
progeny. It is also possible to create many offspring from" 
a single mating and to use the same parent repeatedly 

^in "backcrossing." Therefore, many individual, inbred 
.lines, varieties, or hybrids can be highly related.. In con- 
sequence, there are numerous (and often^yery similar) 
pedigrees. The effective number of marker loci that can 
discriminate between alternate pedigrees is proportion- 
uUv reduced us parents arc increasingly related. Consc- 
quendy, inbred lines can be mpVe similar to one or 
more sister or other inbreds than thtfee inbreds are to 
one or both of their parents. ^ - 

- It has not been usual to search among hundreds ~of 

_~individuals*to identify the most probable Trfaternal and— 
paternal candidates, foY a specific progeny. Most studies 
of parentage arein circUmstancejTwhere there is a priori 

' information for at least-one of the parents (usually thg^ 

* maternal parent). Limited availability of marker loci* and — 
the lack of very high-fffroughput gen onping^sys terns 
ottering inexpensive datapoint costs may have focused 
research on studies that involve relatively few indivi duals 
and where "there is at leasi some a priori indication of 
parentage. Studies that have been conducted without a 
priori information on parentage include species where 
reproductive behavior renders iden tilt cation of the ma* 
tenia! parent difficult or impossible. Examples include 



ih<»<c undertaken on birds that pi-artkv brood purasit- 
isin (Ai.utHi-soN rt ni m \\W) of" extra-pair copulation 
(Wt.TTON et at 199^) or on species such as the wombat 
that are difficult to phsciw in the wild.JTAVi.on ttt aL 
1007). , 

Two circumstances favor a revised approach to the 
statistical analysis of pedigree. First. mokcuiar r marker 
technologies, are rapidly developing and will alloWnu- 
ntcrous Joci $p be 'typed for thousands*^ 'individuals 
rapidly and inexpensively. A greater number and diver?*^ 
sity of larger-scale studies*of pedigree can.be expected***"^ 
within the plantaeeLanimal kingdoms including individ- 
uals in which there is Tto^prior . knowledge of pedigrcc^^ 
A larger number~t>f markers 'mean avgrcater chance 
for errors. Therefore, the second circumstance*follows: 
Procedures that are efficient 'stnd robust in the face of 
apparent exclusions, missing data, and laboratory error 
are required. * " yfjr " — 

The purpose of this article~is'to describe and evaluate 
a methodology that can be used to quantify tire probabil-^ 
ity of parentage of hybrid genotypes. We focus on par^ 
enlage because it is the primary focus ofpublished Utcra- 
tore and it is the easiest leveUof ancestry to understands^ 
The method Ts robust ma the face of mutation; pseudo- 
non-Mendelian inheritance (apparent exclusions) due j*-. 
to residual heteroicygosity in parental seed sources/miss- 
ing data, and laboratory' error. The methodology has a~^' 
number of advantages; (i) It can accommodate large 

~ datasetslSf possible ancestors (hundreds of- inbreds or- 
hybrids each profiled by > 100 marker loci),(iiXit"docs^- 
not require prior knowledge about either parent of the 
hybrid of interest* (in) it does not require independence 
of the markers, and (iv) it-can'successfully discriminate ^ 
between many highly related and genetically similar ge- _ 
notypes. We demonstrate the effectiveness of this^ap-* 
proach to, identify inbred parents of. maize (Zea mcrjtf 
L.)Jiybrids using simple sequence repeat (SSR) marker 

" proxies forf54-mai2:e hybrids together with their parental™ 
and .gran dparentah genotypes* included among^a total 
of 586 inbred lines. The methodology is applicable to . 
the investigation of parentage for allprogenv'd&veloped"*' 
&»m parental mating*without subsequent generations^ 
of inbreeding. ~\v ' ~ 

~ MATERiALi&ND METHODS^-*"*'""" 

Aigoritho>^oniider anjndex hybrid whose parcntage^ts - 
uitknowjx or in dispute: Inbreds in «an available database arc 
possible ancestors c?f the hybrtiC The ^bjertive b to find the 
probabilities of closest ancestry fnr ca^ch inbred o» the basis w 
(»f tjUormatian from SSR>; Ffotii the "index hybrid and the 
inbrcds.**Pfierr: is no reason to trim thv diitubiiae byremoving 
innicds though t-ip be unrelated to rhe indrx hybricLbecaLise 
their lack of relationship will be disrovercd. 

Consider a pair of possible anrews, inbred / and iiibrcd 
/. There is nothing special about in ^particular pair as all 
pairs will br rrt-atrd similarlv, The pt*M't s^ inv<>Ive?» calculating 
du pruhahiiir/ rh.ir inbreds /'and;aie in tin* hybrid's ancestvy. 
ivpeaiin^ rhU for all pairs ol'inhrrri* in rht; darab^sr. 
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The hasis ot the ali»i/rilhin is Ba>eS* rule (r./f.. Bi'.KMV l^Vl. 
IWtil. Let rt/.yiSSKsi ifancl for the (posterior) prnhabililv 
that j u.itcl yareuiKesior-s <»f the iikIcn hybrid given the int-jcma- 
% j\on front the? various SSRs. Let m /') stand for the unmnrli- 
**lioual (or prior) probability 6f the same event. Finally. 
WSSRsU ;) is the probability of observing the various SSR 
results if in fact / and ; are "ancestors. Baves' rule s-ivs 

„ * 

P{i. /l5$Rs) = FfSSRsli;?/) * m;)/S[^SSRAl«. »> * U] L 

where the sum tn the denominator is over all pairs of inbrcds. 
indexed by u and v. P{$SUl[Lj) * Pl&j) U One of the term* * 
in the denominator. (To compute the denominator in the 
above 'expression, fix a particular order to the In betels in the 
database and take u < v in expressions involving the pair (u, 
t/)7.If there are 536-inbreds» for example ra then the number - 
~^of pai rs and the number Of^ terms in the denominator is 

^58~6(5S7)/2 - 171,99\0- Inbrcds i and ; may be parents or 
grandparents or other types* of reladon^or bear no relation- : 
ship at all to the hybrid/ If there are more, than-cwo ancestors 
in the database, such as both parents and all four ^andpar- 
ents, then the possible pairs involving these ancestors will 

* generally have die highest posterior probabilities. If the hy* 
brid's true parents arc in die darn base, then as a pair they will . 
typically have the highest overall posterior piobability. If both 
i and j happen to be related to one particular parent of the 
hybrid, then as a pair their posterior probability will be low 
because they will not usually account for A many of the alleles 
dial are contributed' by the other parent of the hybrid. 

Wc will make the H no-prior-in formation" assumption that 
~P{h, v) h the same for all pairs (u T ~v). This implies that mis 
factor is cancelled from both numerator and denominator in 
the above expression, giving: 

fti._/IS5Rs) = P(SSfeU.;)/2rtSSRsltt, v). 

The problem is then to calculate a typical /'(SSRsl ij). Assume 
- inbreds iand yare both ancestors. We calculate the probability 
of observing the resulting hybrid under this assumption. We 
r * make no assumptions about relationships among the various 
inbreds. Other possible ancestors will be.considered implicitly 
in the calculation by allowing their alleles to be introduced 
through breedings with i and j. However, the nature of such 
^breedings ts not specified. Suppose inbred fs alleles are (a v 

* b) . £ach descendant of inbred i receives one of these two 
alleles or not- An immediate descendant receives one with 

„ probability 1 (barring mutations). A second generation de- 
scendant receives one of themTyith probability 0.5. And so 
on. Since degree, of ancestry (if SnyVis unknown, we label the 
actual probability of passing on one^of these alleles to be P. 
Similarly, an allele from inbred, j has~ been passcd'riown to the 
hybrid or not, and the probability "of th ^former is P. In the 
following. P will be taken to equal 0.50.' aldtoui;h we will also 
consider P = 0.99 in some of the calculations. 
'Assuming P = 0,50 is consistent with the closest ancestors ^ 
- in die database being grandparents. However.T*we_are not«~- 

~rncere$red in grandparents perse. IF- the clow r'Sinces tors in 

*- the-databasc were parents, then as indicated :abovc*P should 
equal 1 (ignoring mutaoons-and laboratory errors). Our~pa-*** 
marv concern is wheii-the parents are not in thr_database. In 
^his case*/* i« no greater thairD:50. Assuming. P= 0-50 is robust 
over the middle range of possible values of P. One way in-n_ 
which it is robust is if there may be mutations and labojaLOrv 
errors, in"which case P would have to bo <|. Taking P Lo 
equal 0,5U levies little penalty against a particular pair in which 
there is an a p pate in exclusion from direct pnrcmugc. There- 
fore raking 7* to be <l means that if the true pareiiLyare in 
the dntibase then thev will not he ruled out if there happen 
to be mutation* and laboratory errors. And if rhe closest ances- 
tors in the flnrabaxr are more ret note rhnn grandpa rents, rhey 



art 1 likely to be idenriKVd because thev will u>uaUv have ihe 
lr_'we^t mismatches of the Urns considered. * 

When / and ; are ancestors there are four possibilities: (1 >• 
The alleles of both inbrcds / and / were passed to the hybrid, 
(^i inbred * came through but "not inbred j> (3) inbred j 
came through but not inbred /. and^l) neither inbred earned ^ 
ch rough. .Assuming tnclcpendcnce f .these have respective proh- 
abitici£S P z /P{\ - F)* r*(l - P). { I. - ft\ In the case P « 
f).5lf all of these probabilities' equal 0,25. 
^ An instance of*ehc law oj total peobaOTty (Sec. 5.3, Bkarv ^ 
t , 1996)"is that the probab&lty^of cabserviug n hybricUs^allcles is 
* the average of-the conditional probabilit^of this everu given^ 
chc above .four cases.'Xhe simplest of the four cases "U? the 
first possibility: Assuming the hybridjs alleles a^e passed dotvn w - 
directly from both inbreds, jhe probability of observing the 
hybrid's geijfStvpcis either I or 0 dcpcndii^gfon whether thc«». 
hybrid shares -both inbreds* alleles. (It is especially-easy when - 
both inbrcds are homozygous.) The otherj|?*ee cases require -~ 
an assumption regarding the possibility tharan inbred's. allele— 
is not passed to the hybrid but ,bj L interrupted by a mutation, 
" a.Uboeacory error, or iiucrYcriingJttrceding. We regard.sjAtih 
- an allele as being selected from all known allies with probabil- - 
ity 1/ (number of alleles), where the number of alleles is the 
total mfmbcr ofojjleles known to exist at the locus ^/question.. 
An alternative approach would be to use rhe.ajlelic proper- 
tions that are present in the databrujc (orjn another dat^b^sc)-^ 
However, thc^lincs in the database may not be r^domly se-" 
lected from any population. For example, a line that has been- - 
highly used in breeding would have many^derivativc lines in 
the database, if> which case the frequencies of its alleles-will 
be. artificially inflated. Assuming equalrtprobabilities for the 
various alleles at a given locus is robusfhvthe sense that it is 
not affected by adding and dropping lines from the database. 

There are many cases' to consider when computing the 
probability of observing ^.hybrid's alleles, depending on the 
zygosity of the hybrid*and the inbrcds, and allowuig. for the 
possibility of missing.allelcs or "extra alleles^ in chelissessment 
of the hybrid and inbred genotype?.. These possibilities .are 
too numerous to ]Jsl "instead we give three, simple examples- 
All the examples have homozygous inbreds p the-mosYcommon 
"'' case.*.\nd each -of the three hybricls^has zwo alleles, agai n the 

nVosx common case. We suppose thatjhe measured alleles for ., r 
. -three SSRs and a panicular trio of hybnd and a^ccstpr inbreds 
arc as we have indicated in Table 1. 

For SSR 1 there are three known alleles* one in addition 
ro alleles a and b that are listed for the three Uncs (hybrid, 
_ inbred 4 ^nd inbred";") in Tabic^. For SSR* 2. and SSR 3 
dierc arc rwojknown alleles in addition to those listed The 
~ calculations in the right half of TabUs J will now be explained. -« 
Implicit in calculan'ng P(SSRli,7) is the assumpuoh— required ^ 
in both the .numerator and denominator of Bayes* rule—that 
inbreds i and j are ancestors of^he hybrid. ConsidcF^SSR I. 
jn case 1 ab©\*e» both ancestors' alleles (a* measured b^dfc . 
" lahOratory process) are assumed. to pass to the index hybrid, 
and so in this case the hybrid is necessarily a&3Fhe probability 
_ of observing the actual bybrid^genotype is I for case 1/as-**- 
™ shown in Tabic l.Jn case 2 V ye asstimi that-^mbred i's allele , 
passes to thchybrid but inbred /s does riot. Indeed, the hybrid 
has an n allele. The probabil it}- of ohsei^sgg)^ as the other , 
allele is 1/ (number of alleles)"*^ I^S. as shown in Table L 
Oxac ^ is similar. In case 4, neither .ancestor allele is pa.wriV 
to the. hybrid; the prooability of observing the hybrid's geno- 
type (or any heterozygous genotype) is f*y/3>(l/S) - 2/9. 
<, : Miq:r p= 0..50. rhe overall (iincondicional) "probability in the 
rightmost column (17/36) is the simple average of che four 
rases. lVS indicated in Table I. 

Kor SSR ^ and SSR :V the calculations arc similar. For SSR 
2 trier c is stjrne evidence tfgnirist 'pair (i.j) l>eing ancestors. 
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Probability oF observing a hybrid's alleles using 'three sample SSRs and four possible combinations (cases) 
of alleles passed, assuming that inbreds i and j arc ancestor? of the hybrid 



^ SSR 



3 



No. *>r 

alleles 


3 * 



Hvhwd* 



Probability of observing the^ 
hybrid's genotype 



Case I 



Inbred / Inbred / 



ab 



bb m 



Bb 



1 

0 



Case'S"'' 
t, not 7 



Case t 
nut j 



^Overall 
probability 
7>fSSR|i,j) 



-1/3 
0 



1/3- 
0 

- 0 * 



2/9 

2/25 - 
2/36 - 



17/36" 
W7/100 
-2/144 



SSR, sinfr^p sequence repeaCmarker profile. 



* IV"* 



— but it is not conclusive. For SSR 3 there is evenness evidence " 
favoring pair ( i t j) . Ii would noc cake many^SSRs with evidence 
^ similar to that for SSR 3 to essentially rule out this pair — 
provided that other pairs-arc not similarly inconsistent, 
r*- To find the overall flCSSRsH j)* multiply the individual 

P(SSRI i,j) over the various SSRs.. There are purely computa- 
tional issues to address. Each PiSSR.\i,j) is.a number between 
0 and 1. When there arc a great many SSRs, the product of 
these numbers wilt be vanish ingly smalirTo lessen problems 
with -computational underflow, for each SSR "we multiply 
J^SSRfiA v) by the same constant for each pair {«, v); the 
inverse of the largest possible such probability. For example. 
*■ , since 17/36 is the largest probabili ty tor a heterozygous hybrid 
at an SSR having three alleles (as is the case for SSR I in 
Table»I). we multiply all factors /><SSR/lu, v) by 36/17. To 
eliminate remaining problems with underflow, we do calcula- 
tions using logarithms (adding instead of multiplying) and 
take analogs at the end. ^ 

The probability P(SSRl u, v) is calculated for all <k.-u) pairs 
and summed over all possible pairings in the database. indud« 
- ~ ing that for the inbred pair under consideration: (i> j). This 
gives the denominator in the expression for P[i P j\S&fa), 
To determine the probability ihat^ny particular inbred, say 
* inbred /, is the closest ancestoi 1 of the index hybrid, sum 
P{SSRIL v) over all inbreds I'with v"=/ /. Call this W I SSRs). 
The maximum of /^I'lSSRs) for any inbred / is 1. But since 
there *is one closest ancestor on each side of the family, the 
"* ~ sum of WISSRs) over all inbrects i is 2. If there is a particular 
pair for which P[i,j\SSRi) is close to I then both /><ilSSRs> 
and ^(jilSSRs) separately will be close to 1. 
SSR data: DNA was extracted from 54 maize "hybrids and 
^"'i roni 556 maize inbreds. All of the hybrids*and most inbreds 
""""ajfc proprietary products of Pioneer Hi-Brcd International: 
* _s.omc important publicly bred inbred lines were also included. 
-The inbred parents and grandparents of each hybrid were 
includctLwithin the set of inbreds. Other inbreds that were 
genoryped include marfy ihnt are highly related by pedigree 
to parents and grandparents of che hybrids. ~Y)\h hybrids were 
chosen because each has a pedigree that is known to US and 
^collectively they represent* n broad array of diversity of maisc 
ge rni plasm that is curre n tly grou-rnn chwfcn ued States rang i ng 
from early to la tt^natnrity. ^» ^ r 

A total of- 193' SSR loci were used in this study' following 
procedure* dewrrihed in Smith st ai (199~). bm- modified as 
described below. SSR loci were chosen on the basis that they 
iuriivi dually have been shown co have n high power of discrimi- 
nation among niai/e inbred lines and foUcc lively thiy providf 
for a sampling livi i r*iiy for each chroindsonie ami. Of rhcsi- 
SSRluci. the foil* twlni* uumhvrs (in parentheses) wen: louurd 
on individual inai/.e c}ironn>soiiies a* folh>\vs: I (35), 2 O-'H* 
* [ i^l». (lln. r» 7 (ti), 8 i I S>. y and ll) 



"(14); 17 SSR lc»c^have not yet Hccn mapped. The correlations 
among the lod^are unknown and are irrelevant for oiTrjncrh-.- 
odology, *. r » - 

Sequence data for primers that allow many of -these "(and 
other) SSR loci to be assayed arc available at*websitc http:// 
w^av.ngron.niUsouri.edit. All primers. were designed to anneal^ 
£ind ainplif)' under a single set okcDjiduions for PCR.in 
reactions. Genomic DN'A* (10 ng) was amplified in 1.5 mM 
MgCln^oO mM KCI.-IO mM Tris-Cl (pH 8f3) using. 0.3 units^ 
Amplifaq Gold DNA. polymerase (PE Corporation) oligonm.- 
clcotide primer pairs (one prifnc-P'Of each pair was fluores- 
cetuly labeled) arO.17 m-m and 0.2 mM dNTPs. This mixture 
i^as incubated at. 95* for Ifr'min (hot start); amplified using v 
45 cycles of denaturadon at 95" for 50 sec, annealing at 60* . 
for 30 sec, extension at 72* for 85 sec; anci then terminated 
at 72° for 10 minrA water bath thermocycler manufactured 
at Pioneer Hi-Bred EnternatiortaVwas used for PCR reactions. 
PCR products- were prepared for electrophoresis by diluting 
3 ixl of each product to<a total of 27 pi tising a combinadon; 
of PCR„products*'generated from other loc^ for that same 
maize genotype. (multiplexing) and/or dH20. Dilution of 1.5— 
jxl of ihi&'mixture~cb 5>l with gel loading dye waVpcrformed; . 
it was then clcctrophorc5ed<ar'1700 V For 1.5 hr on an ABT 
model 377 automated DNA-secjuenccr equipped with GENE- . 
SCAN software v. 3.0 (PE-AppHed Biosystems, f oster^City, Q#t. 

PCR products were sited ^utoma tic ally using ehe^local 
South cm "_s» zing algorithm (EtDER^and ^Southern' ^1937). 
After siting of PCR products usihg^OeneScan^ alleles were 
assigned using* Cenotyper software (PE-Applied Btosyscems), 

"*""* Generally, allele assignations for each locus were made On 
the basis^oChistogram'plots consisting of 0.5-bp.buw Breaks** 
betweenThe h&cogram plots of > 1 ^bp were genWSfly consid- 
ered to .consti'tule separaLion bcoveeti allele bins; however ' 
other cBteria. such' as the presence of the* nontempiate- 

1 --directed adflition of adepine (+A addition) and .naturally 

^occurring 1-bp alleles, werVused on a marfef-bv-marker basiS^. 

iX . to defiite^the allele dictionary. All allele 5CO*?eV were made 
without kaqwing the identitics-of the maize genotypes. ™ " 

**R£SULTSr. — 

TLiblt^2 presents the probtd?ility of closest ancestry' of 
die top five 1 * ran kiagjin bred lines for each of 5 hybrids 
;xt \> = 0.50 (T<ible ^V) \\\u\ P= 0.99 (Table 2B). Prnha- 
bilities of atice.str%' arc shown for all 54 hybrids and the 
top ranking inbreds in Figure 1: P — 0.50 (Figure la) 
unci /•* = 0.99 (figtirt 1 lb). Rrsults for the hybrids pre- 
sen ted h'i Tablt; t,' ;ux* triUurcd at the top of Figure 1. 
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Probability of ancescry of five hybrids using data obtained From 50^100, and 195 SSK loci 



50 loci 



195 loci 



Hvbcl. Inbd. Prob. 



St- 



ln hi I. Prub^ 



SE 



IuIhI. ^jt>. SE 



P .34l7 



* 3525 * 



3556 



3905 



SPl 0.961 
P2 ---0.80V7 

D1P2 



PI 



0.0125 
0„l965 
0.10US ^ 0.1038 
*O.O90T 0.0927 
6.032 _0.0125 



A. 

PI 

«P2— 
-D1P2 
SPl 



* Fl ~ * 


0.85-13 




E*07 


- PL - 


P2~ - 


0.5183 




E-07 


P2 


~DlP2- - 


0.1699 




E-07 


D1P2 


GP1 


0.1441 




E-07 


GPl 


- GP2 * 


0.0110 




E4JS . 


-SPl 


Pi 


1.0000 




E-06 


PI 


P2 


0.9616 




£-08 


P2 


D1P2 


JO.0340- 




E-IO^ 


DIP2 


GF2 


0,0043 




E-09 


D2P2 


D2P2 


0.0002 




£-10 " 


D3F2 


dipF 


0.9S22 






D1P1 


SP2 


0.4927 




E-07 


SP2 


D2P2 


0.2336 




E-07 ~ 


- D1P2 


D1P2 


0.1622' 




E-07 


D2P2 


P2 


0.0565 




E^7 ■' 


™ PI 



Axs^iiiu^ P 1 

0.8749 
0.8141 

* 0.1S59 
- "0.1 2&3 

„ 0.0009^ 

^ 0.9999 
0.5437 - 

* 0-4563 
E-07 _ 

** E-07 



0.50 
0.0252 
0.2235 
0.2235 
0,025 
0.0002 

<E-20 
" <E-20 
<E-20 
E-18 
<E-20 



PI 

W\ P2 
*D'2P2 
SPl ^ 

pr 

P2 

DIP2 
SPl 

.CP2 - 



0-9957 
0.0043 
£-06 
£-06; 

1.0000* 
0.9635--" 
0.0365 
E-15 
E-16 - 



0.6033~ 
X-06" * 
£4)7 

*<E-20 ^ 
O.052S ~ 
O;052S 

«.<£-20 
<E-20 



3940 



3417 



3525 



~355fi 



3905 



~«3940 ^ 



P2 

DIP2 
PI 

D1P1 

DPIP2,. 



'0.9997 
0.9203 
0.0643-^ 
0.0127 - 
0.0014 



0.0001 
0.0009 

E-05 - 
0.0009 



P2 

PI , 
D1P2 . 
D2P2 
DP IPS 



0.9999 


E-10 


PI 


-1.0000 


-<E-20 


0.9997 


E-10 « 


P2 — 


1.0000 


<E-20 


0.0003 


.E-14 


-D1P2 


E4)9 


<Er20 


E-05 


E-15 


D2P2 


Mf 


««■ <E-20 


£-06 


- E-II - 


GGP2 


£-17 - 


~ £-17 


0.9S03. 


0.0053 


-* ; PI 


l.oooo- 


E-OS 


0.6230 


0.0976 


DLP2 - 


1.0000 


*E^06 


0.2321 


0.0617 


D2P2 " 


£-06 


£-06 


0.1317 


0.0372 


- P2 


E-07 


£03 


O.0 197 


0.0053 


D3P2-,. 


E-10 


* E>!6 


0.9999 


£-05 


P2 


l.OOOO 


E*-09 


0.9970 


0.0011 


PI 


1.0000 


- E-09.. 


0.0030 


0.0011 


D1P2 


E-U 


E-ll 


O.OOOl 


E-05 


DPIP2. 


E-17 


. E-17' 


0.000 i 


E-07 


D2P2 


E-19 


— JE-18 



SPl 


- 0.9995 


0.0001 


PL 


P2 


„ 0.SS36 


0.1653 


P2 


D1P2 


*#0.0722 


0.1029 


D1P2 


D2P2 


0*0441 


0.062S 


_D1P1 


PI ~' 


0.0004 


' 0.0001 


SPl 


PI 


0.9999 


0 


. PL- 


PS- 


0-3991 


.jT - 


D1P2 


DIPS" 


- 0,1 Q0S 


E-1L 


P2 


GPl 


E^05 


0. 


- D2P2 


GP2 


^£-06 


£-17 


SPl- 


PI 


1.0000 


0 


PI 


P2 


0.9996 


0 


P2 


DIP2 


0.0003 


-0 


D1P2 - 


OlPt 


*E-ll 


~0 


D3Pt 


D2P1 


E-13 


*0 _ 


" D2PI 


DIP1 


0;9999 


0 


DIP1 


P2 _ 


0.9992 


. 0 


P2* 


SP2 


- 0.0006 




D1P2 


D1P2 


E-05 




WSP2 


D2P2 


E-06 


0 - 


D2P2 * 


P2 


0.9999 


' *E4)S 


P- ^ 


D1P2 


0.9999 


E-OS 


-P I " 


PI 


~ E-06 


E-13 


DIPS - 


D1P1 


* E-OS 




"-D2P2 


DPIP2 


E-12 




DPIP2 



6. Assuming P - 
0.9999 
0.993S 
0.0061 
E-05 
£-05 

0.9999 
0.9749 
0.025 
E-20 
£-24 

1.0000 
0.9999 
E-09 
E-21 — 
E-21. 



0-9999 - 
0.99Q9 
E-06^ 
£-07- 
£-09 

1,0000 
0.9999 
F.-(TS 
E-12 
E-21 



0.99 

E-05 
' 0.0107 
' 0.0107 

E-06 

0 

0 - 
-0 

0- 
0 
0 

if 

0 

0 .r 

0 

0- 

- E-OS 
E-06 J 
E-06 
E-13 
E-10 



PI 
P2 

D1P2 
D2P2 
SPl 

PI 
P2 

D1P2 

CP2 

D2P2 

PI 

D1P2" 

D2P1 

D3P1 

PI 
P2 

DIP2 
D2P2 . 
DLPL 



E-OS «l* 
£-05 

E-05 DIP2 

E-ll —^,DPIP2 

E-21 " D2Pt: 



0.9999,. 
0.9999 
E-ll ' 
E-J4- 
E-20 . 

1.0000- 
0.6135 
0.386*1 
E-43 
E-49 ^ 

0.9999 

0.9999 

E-22 

E-49 

E-54. 

1,0000 

0.9947 

fc 0;0052 
EOS 
£-25 . 

1.0000 

1.0000 

E-24 

E-44 

E-50 



*E-08 

E-08 

E-U I. 
>E-14 

E-21 

0 ^ 
074446^ 
0.4446 
"0 - 
0 

0 
•0 
0 ~ 
0* ^ 

0 — 

E-09 
E-09 
E-l I 
E-13. ^ 
E-25-r^ 

E-09 

£r09 

*E-24 - 

E-N 

UP* 



Hvbcl hvfe'ricl: Inbd.. inbred; Prob.. probability: SE. sii»ndiii*fl error, referring to the variabUUy in rhc.rcM.ilts 
af the !'iuis: PL. purent one: P2. pnrent two; SPl SPt!. full sih|iujr-of pnrcm onc/piucm iwo: .D/Pl / D/P2. 
* ricrivative-* of piirrni i>ue. parcn't rwo. index i far cliMinu iith.vtl lines: DHlPt!. dcrivauvvs oi" Imth parcne o»c 



anrl purcnr nvo. 
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FtCL'KK 1. — (a> Probabilities of ancestrv, assuming H - 0.30, "for 54 hybrids ;uid top ranking inbrccU — those with probability 
of uuccMiy ;il l<-;wt 10' to . (b> PrutKihililics u{' :MiCesii*y, assuming P = OAHV tor all 3-t hybrids and toj) milking inbrccls — chase 
ivirh pmbubiliu of anrcMiy *n lexst It) i^ vt 
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"** - FlOt'Htt I. — DmUnurd. 



VS&eri the algorithm used P ~ 0.50. thr uvo correct 
parent were identified as highest in probabilitv tor 
(8 l J?n) hybrid* (Figure I ). For each of (> hybrids (3893. 
WPltf, 3SR.V2. ^914, and XWliSAV one parent 

ranked in the top two places. The oilier pa rem vvas 
^supplanted rither by 4 i sister inbred or by an inbred duu 



was a direct pro'getiy uf. That parent. Overall* J 02 f°4Vfc) 
of 108 parental inbred** were correctly identified. For 
hybrids where both parents ranked first or second, die 
range of probabilities fur parental lines dial 'ranked first 
front anions all other inbred* ranged from l.unOO to 
O.'W/; p;iit;-:M;il lirics ranking second rimmed from 
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LOW) lu 0.W5S. For 35 hybrids, both parents had pnil>- 
abiltucjk uf ancestry in excess of O.QyO/Prubabiliucs til' 
anccstrv for mmpareius that ranked in first or second 
. places were Iron i 0.9y.0<J to 0,7054. Fur the iiiajoiiiy ul 
' hybrids, ihr probability of the third and highest ranked 
nonparemul inbred was at or below E-Ofi. f his indices 
that there is usually ^ry little uncertainty about closest* 
ancestors. _ " - — ^ 

When the algorithm used-P = 0,99 to examine each < 
of the 54 hyBridsThoth parents were correctly identified 
for 52 (96%) of hybrids and.for"98% (102/104) of the 
paints across all hybrids (Figure*!). Two hybrids (3914 „ 
~ and X0915A)rin which one parent was not ranked in 
the .top two, were also in the subset not ranked jn the 
top cwo assuming P = 0.50 (above). hrboth cases &eir 
-ranks improved (both to third rank) and the actual 
parent was supplanted by an inbred that was a direct 
progeny of the corresponding parental line/For 49 hy- 
brids, both parents had probabilities of ancestry in ex- 
cess of 0.999, Among, the 5 hybrids having a parent 
ranking second, with a probability of ancestry below 
0.999, the'lowest of these probabilities was 0.S976 and 
the highest probability for a third ranking nonpatent* 
was 0.1023. For most hybrids the probability for the 
third and highest ranked nonparental inbred was at or 
below E40. " <. 

Table 2 also addresses data analysis in circumstances 
where heterozygous loci occur in inbred lines or where 
a hybrid is scored for the presence of more than two 
alleles per locus. The presence of more than a single 
allele per locus in inbred lines is an infrequent occur- 
rence in well-maintained inbred development and seed 
increase programs but is possible because of 
loci can still be segregating and unintended pollination- 
from genotypes not designated as parents of the hybrid 
can occur. For hybrids, more than cwo alleles per locus 
can be scored when DNA is extracted from a hulk of 
individual plants and because inbred parents are^not 
homozygous due either to residual heterozygosity or to 
contamination or because one or more direct parents * 
of the hybrid are themselves hybrids. The presence of 
more than one allele per locus in an- inbred line and 
more than two alleles per locus in a hybricTtherefore 
- *can be accommodated *by multiple runs of the algo- 
rithm, each witbTa random choice of^two alleles per, 
locus. Consequently, standard errors in the case of ana- 
lysing data from 195 loci tend to bc*very small because 
-^there were few loci where an inbred or hybrid sample 
(from a bulk of individual plants) was scored for more 
^ than two alleles. 

MarshaU. et nl ( 1998) have drawn attention to errors - 
that can be encountered in genotyping surveys. These 
crmi\s include missing data, null alleles, and typing er- 
rors. We therefore investigated the robustness of ihe 
algorithm bv f c*amiiiini; die effects of modifications in 
the data for five hybrids UU17, 3525, :*55t>. * Lnc ' 



v rf fif, 

^940 j. First, we redttred*the number ofSSRs used, from 
the full set of L9:"i to loo uiuUR'n to 50 (Tabled). Use 
f)| '50 loci generated incorrect rankings of one parent 
hu eaci^of two hybrids UUlIand S'JW) and for bodi 
parents of one hybrid ("90fD. All of these most.hlgnly. 
ranked nonparental inht;eds were closely related to the,, 
true parents for each of ihe respective hybrids; six differ;,, 
cm inbred lines were involved. Fourwjere direct progeny 
of the^true part* ts (one with udditi^fiul backcrosses^ 
from the tcue parentf 'andnwo were t\ill sisters (from a*" 
.cross of-highly related inbreds) of the.actual parent of 
die hybrid. Usmgj.00 loci resulted in .correct-parentid 
rankings fqr all hybrids exoeprfor 3905 where neither 
parent-ranked in" first or second place, .Four inbreds^ 
outranked the trie parents of 3905, Alt four nonparents 
were closely related to the respective true parents; three _ 
f were direct progeny of thl^true parent of the hybrid^ 
- (one with additional backcrossing to-diat parent) arid 
one was a full sister of the truT'parent. XJse of data from - 
ail 195 loci corrected th emplacement" for. one of dfe 
parents of hybrid 3905. Twojnbredf that were not par- 
ents of this hybrid remained ranked more highly than . 
one- of the true parents. Both were, direct, progeny of^ 
that parent, and one of these infrreds had additional^ 
backcrossing to- Uvafparent in its pedigree: ^ w 

To address the consequences of laboratory aricf other 
sources of error, we artificially compromised data qual- 
ity- beyond the level originally provided by eliminating 
specific proportions of alleles that had been scored (cfir*'" 
tablish'ing scenarios where various numbers of SSR al-^ « 
leles were not scored) and by misscoring other alleles 
(establishing scenarios where various' numbers of SSR 
alleles were scored mcorrecdy). We also combined the * - 
scenarios of missing data and wrongly scored data. Table' 
^contains a summary of the results of making these * _ 
modifications in the"dat£'For all modifications we used 
data from all SSR loci and we also randomly chose SSR 
loci to create subsets of 50 arid lOOToci. In each case, 
the prSgnim was run 20"times for each hybrid/set uf~ " 
loci. When all 195 loci were examined, replications dif- 
fered -only according-tf6 the particular, choice^of 'alleles 
for loci where more than two allclcs,had"bcSl scored. 

To evaluate robustness in the face of missing daiaor^ 
mistyped data, we simulated individual and combined 
caiegQrleVof these data in the hybrid anff all inbred- 
lines at levels of 2, 5, 1 0, aitS" l 25% of^he alleles for each . 
of five hybrids and ail ir*br^*bTyohd the level of error 
as originally scorcd-bynhe laboratory. -We examined the 
effects of these levels and types*of error for^three sizes 
of database: -50 loci, 100 locCand all 195 scored loci. 
The sainfc five hybrids considered in Table 2 were investi- 
gated: 3417, 3525. 355fi, 390:V and 3940. One of these 
hvbrids (SDOfi) was chosen becait.se one of its parents, 
did not rank among the top rwo places even when the 
complete and unmodified dnui from all SSR loci were 
used. 

dimples of robustness in the lace of additional error* 
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("hi live hybrids usinj* subsets of 30 and 1 00 loci and all 
Im-i arc shown in Table 3 wherr numbers of parenLs 
ran kin i; into the top two place* are presented. Degrade 
uoii in the preferential ranking of parent inbreds at a 
i?<tf,'l ot" 2f>% additional missing dura was showmfor one 
hybrid (3525) witn usage j?f 50, 100, or all'SSR loci. 
%D* gradation in the preferential ranking oF parent in- 



DISCUSSION' 



^*bred* at a level of 25% additional misscored data was 
shown foiVtjjivbHd 3356. "When both additional Jevels oF 
missing and misscored data were simulated, degradation 

"^"in the ability to preferentially rank^nbred parents OC- 
"curredJbr alhhybrids and for all sets of 5SR.(30, 100. 
and 195 loci) exceptor hybrid 3417 when data from 

"~ 195*SSR loci were used. Over all five hybrids, use of 10<T - 
loci'improved robustness fromjhe use of 50 loci; use 
oF 1 95 loci further improved robustness for four hybrids ^ 
(3417. 5525, 3905, and 3940). The degree oF improve- ~ 
merit was small, except for hybrid -3905. 

We also ranked inbreds according to their probability 
of ancestry of hybrids .when both parents and all inbred 
derivatives andTull-sister inbreds of the respective in- 
bred parents for each hybrid were excluded from the 
analysis. The resultslire too voluminous to present here 
but can be summarized as Follows: Using P = 0.50, a 
grandparent of each respective hybrid rankedihto first 
place For 41 (76%) hybrids; probabilities ranged from 
0.4976 to 1.0 and most were above 0.9999. Other classes 
of inbreds thai ranked in first position for probability 
of ancestry were inbreds- derived directly by pedigree 
from a grandparent of the respective hybrid (DGP) for 
1*3% of hybrids, inbreds derived directly by pedigree 
from a great-grandparent of the respective hybrid- 
(DCCP) for 9% of hybrids, and one class (2% of hy- 
brids) with 'an inbred ranked ^into first place that was 
direcdy related by^pedigree to the grcat-grcat-grahdpar- 
ent of that hybrid. Inbreds that ranked in second posi" 
tion were related to the respective parents of the hybrid 
as follows: Thirty-one {579o of hybrids) were a grandpar- 
ent of the" respective hybrid. 11 {20%) were classed as 
DGP. 7 (13%) were DGGP, 1 (2%) was class DGGGPr 
and 4 (7%) were a great-grandparent (GGP) off the 
respective hybrid. Over all hybrids, two of the four - 
grandparents ranked into first and second positions for 
23_{4£?b of-hybrids): three grandparents ranked- into 
the^first three' positions for 5 (9% of hybrids). There 
were no instances whercall four grandparents, ranked 
into die first Four positions. Thirty hybrids had a grand- 
parent ranked into-fiY^i posh ion using P — 0.99. The , # 
number of grandparents ranked into the top five posi- 
tions was 93 (compared to"V()8 when P - 0.50). The 
number of grandparents raiiking into the top two posi- 
tions was 35 (compared to 71 when P ~ 0.30). The 
mean probability of a grandparent thai ranked into the 
first two positions was 0.9288 (Si) = 0. 1 +341 when P « 
0.30 ;md 0.9980 (SO = 0.0104) when P = 0.99. 



Tin- prevalent use of paternity indices demonstrates * 
thar it is advantageous to have explicit probabilities of 
a n cestn Mod is ti 1 1 gtiis h amo ng d i He re n V pe c H grees . MQr >v> 
Iccular murker "profiles are rapidly becoming inure- ex-,', 
tensive and cost efFective to generate. Features that would 
advance $0 statistical analysis of molecularfji^kcr data 
to provide explicit probabilities of ancestry include the 
ability to calculate proUffiilities of ancestry where ttfere- 
is nYFa^rco^n formation as to the identi'ry uf one< usually 
the maternal) parenVand robustness in the.face of labo^' 

ratory error. ^ 

"Maize inbred liries ancT hybrids provide a very^exacfing 
set of materials for evaluating the discriminaliofyabili ties 
of molecular data and statistical procedures that "are 
employed to interpret those data. Hundreds of maize"' 
inbred lines of know pedigree together encompass a ' 
great diversity ancl .complexity of pedigree relationships. 
Some inbred lines cambe very highly related and geneti-"*" 
cally similar due to their derivation from common' par- 
entage including from parents that are themselves highly 
related, Consequendy, relationship categories such as *" 
"sister" or "parent"' when applied to maize inbreds usu> ' 
ally refer to closer* degrees of pedigree relationship and, " ' ^ 
thus, of germplasm andTnoIecular marker profile simi- 
larky than those *of the equivalendy* named classes of 
relationship for aTtimal species. MostTrtake hybrids that 
are widely used in the* United States today arc .con- - 
structed from pairs of inbred ttfYes that are unrelated 
by pedigree, each inbreci parent having been bred from 
a separate "pooP of germplasm. Various degrees of Tela t- 
edness arc possible between hybrids* according to the ■*• 
pedigree relationships among their constituent inbred ' r 

'parents. " **- •* 

Using P - 0.99 in the algorithm is more specific for 
identifying parents than using P =*<h5G. However, P ™ 
0.99 is less robust for identiRTng'Other relatives, such l,v ~* 
as grandparents. When the algorithm was rim at P — 
0.30 there were 6 hybrids for -which one parent did not 

^rank among~the top two most probable genotypes. For 
the remaining 48 hybrids the correct parents were iden- 
dfied even in circumstances where other candidate in- * 
breds included not only fuU-s^STcr lines bred from re- 
lated parents but also inbTcds even more closely related-*^ 
to the'true parent by virtue of b*e ing" backer OSS convey 
sions of the inbred parent of the hybrid, For jrach of — 
the 6 hybrids where a^nonpa rent ranked above a" truer ^ 
parent, that higtfeV ranked inbred ^vas ahftw^either a 
sister or progeny* of the outran ked^fue" parent. The — 
rangr of pedigree relationships a^fyt'pressed by the 
Malecoi cpeffiLieut of relateclness (-MA,LF.C0T L948) lhat 
was encompassed by pairs of trtie- parents and more - 
highlv runke<l iubicd relatives of the true parents was - 
FromO.iWD to O.Ml : »,^0. A roefricient of approxi- 
mates a relationship bt-rwee" inbred A and A' where 
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inhred A' has been bred from a crass of i nHrccis A and 
ii willi huLwriMi out; and two additional UickcroririCS of 
the parental inbred A.. A Malccot coefficient of relation- 
ship of O.9I5S0 closch approximates a relationship be- 
tween inbreds A and A" where four additional back- 
crosses of parental inbred" A follow the Initial cross of 
inbreds A and B. ^ F 

Running the aJgorUlmi at P = 0,99 in comparison to 

r P = 0.50 raises theprobability of aqcestry for the parents 

• while diminishing the probabilities for the third and 
lower rankTngi candidate inbredjincs. Use of the algo-_ 

r~ - rithm at P =^0.99 increased both' the percentage of 

hybridsjwitb,both parents ranked in the first wo posi- 
~ ^ tions (from S9 to 96&) and the percentage of parental 
inbreds that were ranked^flrst and second Xfrom 94 to 
93%). Two hybrids (3914 and X0$15A> did not have 
-*£T " both parents ranked first and second when the algo- 
rithm was nin at P = 0.99. For both of these hybrids 
v the non pa rental inbred that outranked* the true parent 
was itself a product by pedigree from the true parent' 
"that had been created by an additional four backcrpsses 
of that parent; the Mai ceo t coefficient of relationship 
between the parent of the hybrid aricf the inbred that 
outranked that parent for these two hybrids was 0.9636. 

Robustness was tested by evaluating the effects of us- 
ing data from different numbers of loci and by simulat- 
ing additional levels of missing and misscored data up 
to combined levels of 25% error beyond that" which was 
provided by the laboratory. From our experience, error 
rates of 5 to 10% can occur in SSR profiling of maize 
due chiefly to the combined effects of residual heterozy- 
gosity among seed lots and by deficiencies in the scoring 
of hctcrozygotes in hybrids* The additional levels of 
simulated error, therefore, include values (up to ~35% 
total error) that are well outside of our experience. For 
five hybrids that were examined, increasing the!) umber 
of loci from 50 to 100 (with no' additional missing or 
misscored data) did reduce the number erf"; instances 
wherv*inl>rcds that were not parents of n hybrid "out- 
ranked the true parent from four to one. Nonetheless, 
all of these more high Lyjxu iked inbreds, although they 
, were not themselves the (rue parents of the respective 
hybrid, werccither direct progeny or full sisters. of the 
true parent (Table 2). Consequendy, if such degrees of 
~ - crrorcan be tolerated in respect of pedigrees for inbreds", 
- : that are identified as parents of hybrids, then SSR data , 
from 50 loci of equivalent jdiscrimi nation ability are 
j*** sufficient. Use of data from 50 loci also evidenced ro* 
business in the face of«wp to 10% additional levels of 
either missing or misscored data; no degradation in the 
ability to identify a parent was apparent up to the level of 
10% additional error exegpt for \Q% additional missing 
and misscored alleles for one hybrid (3525; Tabic 3). 
However, use of 100 loci increased the proportion of 
true parents that were coirccdy identified from o^Jc 
(for 50 loci) to 7l'-( (mean correct parent* over all 



levels of error: Table 3). Use of data from. 195 loci 
provided greater resiliency against atldiiionul levels of 
error/ However, use of data from 195 loci was unable to 
provide resiliency against the negative eflcr r rt s o ^Adding 
combined levels fat 25%) of both missingund misscored ^ 
data (Table 3). At the 25% level of additional poor data 
integrity, mbnaifiiftuu were not related to the true parent r 
of the hybrid outranked B1&&fue parent for four of the - 
five h y brids^Leve I s o f m issi nijw m i ssc o red dag. sh ould, 
therefore, be kept. below 15-20% (assumirfga level of 
* 5-10% error in the'&Lta-we analyzed prior to emulating 
"^additionaNelTor). ^ —^j*- 

3^ We have previously examirfScTthe pedigrees oMnr*— 
breds that are raJiked into the first two positions when 
the true parents "sfre removed from* the list of candidate 
- inbred' lines. Usually, direct progeny*b*cfull sisters of 
the true parents then rank most highly (data not pre**. ~ 
^ sented) . Wc therefore ex'amined'the rankings of inbreds 
r with reSpecc to-thcir ranking and proTSbiliry of inclusion 
in the ancestry of each hybrid after^the'Y&movak not 
only of the true parents'; but also of thT*progcny of the 
true parents and any-full sisters* of the true parents 4 . In 
these circumstances the grandparentTof the hybrids are" 
ranked. prcdomi handy into top positions. Usirig P = 
0.50, a grandparent ranked into firs**posin6n for 76% ■* 
hybrids and into second position foro??^ hybrids; with 
P - ,0.99 a grandparent ranked into first place in 56% 
of hybrids. At P = 0,50 two grandparents ranked into ■ 
first and second positions- for 43% hybrids and into the ^ 
first three positions for an additional 9% hybrids. Most 
of the remaining -inbreds that ranked Inter the top two* 
* positions were progeny of the grandparent. A total of - 
108 grandparents ^ranked into the top* five positions - 
when P= 0.50; 93 ranked Tfito these positions when P ~ 
0.99. Seventy-one grShdparents ranked into the 'fop two 
positions when P — O.oO; 55 grandparents^ranked'ihto^- 
these positions when'*/ 3 = 0-99. The -mean probability 
of a grandparent in the^top two positions was 0.9288 
(SD 6.1434) when P - 0,30 and 0.9980 (SD 0.0104) 
vvhen-f = 0.99. Our- algorithm was -written to identify 
pairs of ancestors; alternative algorithms could be"£ai-*~ 
— • lored to identify all grandparents once parents tfffit been 
identified and removed from ilie list of candidate "in- 
breds- ~ - r*. 

We havede mo astrated Die? capability and'robLisEness " 
'of an algorithm tharcan be »»sed to show probability of- 
parentage"ih circumstances where the a ptiori pedigree 
idenujy of neither parent is known. Exclusions'are Kike*i ^ 
into account, thereby allowing parerTtage to be shwtv7i 
e\-en when the two.pai'&nts are not represented in the 
database of motet ul^pfoftles that are examined. Het- 
erozygous candidate parents can tee^ ar commbdateri. 
The number of loci that is necessaff to' provide a reliable 
basis of determining pedigree is dependent upon the 
degree of rclatedness among parents and nonpareil ts 
and upon the discriminatory ability of the market system 
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in tin; spt.'tievs of iiiK'tTM. L'sini; /* = <MH> compared to 
0.f»0 prciL-rcniiully kKMUil'Kcl uun'c true putcut* iuul 
with a greater difference of probability to third placed 
tiu n pan; i its. If there is nrasomibk' assurance that the 
parents are among the candidate list of inbreUs, then 
, _P = 0.99 should be used; if greater robustness is re- 

^* squired, then P = 0.50 should be used^ 

Applications of our algorithm include die^id^nutica- 
fl&n of pedigrees' amongMndividuals of plant or animal 
species where molecular profile datasets exist that"can 
be interpreted in terms of segregating alleles at individ- 
ual marker loci*and that provid?*a # sufrkient power of 
discrimination. Capabilities to generate large dataseX* 
of suitable molecular profile data* are" aire ady available 
and -are increasing rapidly with the advent of single ' 

^- nucleotide polymorphisms. One further application of 
our algorithm is to assistin the protection of intellectual 
property that is obtained on plant varieties, or upon 
■specific dams or sires of animals through the determina- 
tion of pedigrees, - 
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